OpenCL/C++ hybrid rendering
Posted: Sat Feb 10, 2018 2:45 pm
Introduction
LuxCore supports GPU+CPU rendering since the first days thanks to OpenCL CPU device. However the performance of OpenCL CPU devices (from AMD and Intel) are very disappointing. They are a lot slower than C++ code and they can even slow down the GPU performance. So GPU+CPU rendering is often slower than GPU-only rendering.
OpenCL/C++ hybrid rendering
OpenCL/C++ hybrid rendering is the solution to this problem. OpenCL is used to run the rendering on GPUs while native C++ code is used for the CPU rendering. This solution has 3 major advantages:
1) there is no need to install an OpenCL CPU device driver. NVIDA doens't include one in their drivers and it is complex and cumbersome to have to install another driver from AMD/Intel.
2) there is no slowdown of GPU rendering so using GPU+CPU rendering is always faster. I have worked to have maximum GPU performance even with a 100% load on CPU.
3) C++ written code is a LOT faster than OpenCL running on CPU device. It is usually 4-6 times faster in my tests.
A test
This is a rendering with a OpenCL GPU (AMD R290X) and OpenCL CPU (i7 3630k, 6 cores + hyper-threading):
and this is with a OpenCL GPU (AMD R290X) and C++ code (i7 3630k, 6 cores + hyper-threading):
The 12 threads runs at about 2,750K rays/sec with C++ while at a miserable 762K rays/sec with the OpenCL CPU device. The GPU runs slightly faster with the hybrid rendering too.
The result is that hybrid rendering runs at 3.03M samples/sec while OpenCL GPU+CPU at 2.52M samples/sec. Now the contribute of CPU can be significative (and more complex is the scene and more significative is going to be).
Metropolis sampler
A word of caution for Metropolis sampler: the OpenCL implementation is different from the C++ one so there may be differences in the output and mixing the results may not be a good idea.
LuxCore API
Hybrid rendering is now enabled by default with PATHOCL and TILEPATHOCL. It is disabled and not usable with RTPATHOCL for obvious reasons. The number of hybrid rendering threads lunched is the same of cores+hyper-threading and can be controlled by this property:
LuxCore supports GPU+CPU rendering since the first days thanks to OpenCL CPU device. However the performance of OpenCL CPU devices (from AMD and Intel) are very disappointing. They are a lot slower than C++ code and they can even slow down the GPU performance. So GPU+CPU rendering is often slower than GPU-only rendering.
OpenCL/C++ hybrid rendering
OpenCL/C++ hybrid rendering is the solution to this problem. OpenCL is used to run the rendering on GPUs while native C++ code is used for the CPU rendering. This solution has 3 major advantages:
1) there is no need to install an OpenCL CPU device driver. NVIDA doens't include one in their drivers and it is complex and cumbersome to have to install another driver from AMD/Intel.
2) there is no slowdown of GPU rendering so using GPU+CPU rendering is always faster. I have worked to have maximum GPU performance even with a 100% load on CPU.
3) C++ written code is a LOT faster than OpenCL running on CPU device. It is usually 4-6 times faster in my tests.
A test
This is a rendering with a OpenCL GPU (AMD R290X) and OpenCL CPU (i7 3630k, 6 cores + hyper-threading):
and this is with a OpenCL GPU (AMD R290X) and C++ code (i7 3630k, 6 cores + hyper-threading):
The 12 threads runs at about 2,750K rays/sec with C++ while at a miserable 762K rays/sec with the OpenCL CPU device. The GPU runs slightly faster with the hybrid rendering too.
The result is that hybrid rendering runs at 3.03M samples/sec while OpenCL GPU+CPU at 2.52M samples/sec. Now the contribute of CPU can be significative (and more complex is the scene and more significative is going to be).
Metropolis sampler
A word of caution for Metropolis sampler: the OpenCL implementation is different from the C++ one so there may be differences in the output and mixing the results may not be a good idea.
LuxCore API
Hybrid rendering is now enabled by default with PATHOCL and TILEPATHOCL. It is disabled and not usable with RTPATHOCL for obvious reasons. The number of hybrid rendering threads lunched is the same of cores+hyper-threading and can be controlled by this property:
Code: Select all
opencl.native.threads.count = 12