Page 1 of 4

v2.3 Vs v2.4 performance

Posted: Wed Apr 29, 2020 8:59 pm
by neo2068
Dade wrote: Wed Apr 29, 2020 11:18 am I should have fixed the OpenCL problem: it was a mess of alignment and size of a 64bit field (CPU, OpenCL and CUDA work in different ways :roll: ).
Yes, OpenCl compilation works now but I have another question. I downloaded the latest azure build and the OpenCL build is as fast/slow as my build. I see the expected speedup of cuda, too. Which is good on first sight but it is approx. 2x slower than the 2.3 build. What is the difference of the new framework compared to the old one in v2.3? Is this normal behaviour?

Food scene with LuxCoreUI v2.3:
v2.3.png
Food scene with LuxCoreUI v2.4alpha0 OpenCL build:
v2.4.png
Food scene with LuxCoreUI v2.4alpha0 CUDA build OCL devices:
v2.4 OCL.png
Food scene with LuxCoreUI v2.4alpha0 CUDA build CUDA devices:
v2.4 CUDA.png

Re: Windows Build FAILED

Posted: Thu Apr 30, 2020 1:03 am
by Dade
neo2068 wrote: Wed Apr 29, 2020 8:59 pm
Dade wrote: Wed Apr 29, 2020 11:18 am I should have fixed the OpenCL problem: it was a mess of alignment and size of a 64bit field (CPU, OpenCL and CUDA work in different ways :roll: ).
Yes, OpenCl compilation works now but I have another question. I downloaded the latest azure build and the OpenCL build is as fast/slow as my build. I see the expected speedup of cuda, too. Which is good on first sight but it is approx. 2x slower than the 2.3 build. What is the difference of the new framework compared to the old one in v2.3? Is this normal behaviour?
No and, as far as I know, it happens only to you, to start, try check if it is the scene, your version looks like a very old one and it is likely to use the BCD denoiser. BCD is a peculiar beast that require a lot of (GPU) ram. Try some other scene (and be sure to disable BCD, it is enabled in most old .cfg).

Re: Windows Build FAILED

Posted: Thu Apr 30, 2020 1:08 am
by Dade
Check your screenshots: the GUI loop time is 18ms in v2.3 and 250ms in v2.4 there is something very wrong going on :?:

Re: Windows Build FAILED

Posted: Thu Apr 30, 2020 5:32 am
by neo2068
Dade wrote: Thu Apr 30, 2020 1:03 am No and, as far as I know, it happens only to you, to start, try check if it is the scene, your version looks like a very old one and it is likely to use the BCD denoiser. BCD is a peculiar beast that require a lot of (GPU) ram. Try some other scene (and be sure to disable BCD, it is enabled in most old .cfg).
Yes, the scene was old. But with new scenes (i.e. food and danish mood) and disabled denoiser I get similar performance values. At least the GUI loop time is down to ~20 ms.
If no one else has this slowdown problems I will investigate it on my side. It has no high priority. I thought that you had an spontaneous idea what could be the root.

Re: Windows Build FAILED

Posted: Thu Apr 30, 2020 7:26 am
by epilectrolytics
neo2068 wrote: Thu Apr 30, 2020 5:32 am
If no one else has this slowdown problems I will investigate it on my side.
I can confirm neo's results.
With the old Food scene it looks very similar but even in the Cornell scene it is visible:
Old OCL speed is faster then new OCL but slower than new CUDA.
foodsc.jpg
cornell.jpg
There clearly is a problem, and not a small one, given a speed loss of 40% in the Food scene.
I'm seeing the same GUI loop time increase and also the responsiveness of LuxCoreUI2.4alpha compared to version 2.3 is very bad when trying to open and resize editor windows.

Re: v2.3 Vs v2.4 performance

Posted: Thu Apr 30, 2020 12:57 pm
by Dade
First some uniform rule for testing or it is a mess:

1) use an automatic Azure build from the very latest source or v2.3 release binaries;

2) use LuxCore2.1Benchmark (https://github.com/LuxCoreRender/LuxCor ... xCoreScene) with LuxCoreUI with BCD disabled;

3) run the test 2 times (so the second doesn't include any kernel compilation);

4) the the CPU threads to 0 with a command line option "-D opencl.native.threads.count 0" (we are not interested to measure CPU performances);

4) given above rules, we can use the rays/sec statistic seen after pressing the "j" key (because it is stable after few seconds of rendering while samples/sec require a long run).

Given the above rules, my results are:

- v2.3 with OpenCL 1 x RTX2070 => ~20,200 rays/sec
- v2.4 with OpenCL 1 x RTX2070 => ~17,500 rays/sec
- v2.4 with CUDA 1 x RTX2070 => ~19,100 rays/sec

This should be explained by "v2.3 compiled materials/textures" Vs "v2.4 interpreted materials/textures". The current materials/textures interpreter can be further accelerated in CUDA by using function pointers (not allowed in OpenCL), etc.

Re: v2.3 Vs v2.4 performance

Posted: Thu Apr 30, 2020 1:41 pm
by Dade
Dade wrote: Thu Apr 30, 2020 12:57 pm This should be explained by "v2.3 compiled materials/textures" Vs "v2.4 interpreted materials/textures". The current materials/textures interpreter can be further accelerated in CUDA by using function pointers (not allowed in OpenCL), etc.
Indeed, more the scene is materials/textures heavy (i.e. complex node trees) and more compiled is faster than interpreted (and vice versa).

Re: v2.3 Vs v2.4 performance

Posted: Thu Apr 30, 2020 2:07 pm
by neo2068
@Dade: Thank you for the detailed explanation. I have thought, that it has to be something like that. I hadn't looked into your rework of the kernel compilation code. So, I didn't know how the permanent kernel was achieved. For me it is ok and it make sense. You are really doing a great job.

Re: v2.3 Vs v2.4 performance

Posted: Thu Apr 30, 2020 2:30 pm
by Dade
neo2068 wrote: Thu Apr 30, 2020 2:07 pm @Dade: Thank you for the detailed explanation. I have thought, that it has to be something like that. I hadn't looked into your rework of the kernel compilation code. So, I didn't know how the permanent kernel was achieved. For me it is ok and it make sense. You are really doing a great job.
I assume I can regain some performance (but only in CUDA): for the textures alone, there is a "switch(textureType)" with 60+ cases, in CUDA it can be done with a jump table with function pointers.

Re: v2.3 Vs v2.4 performance

Posted: Thu Apr 30, 2020 2:32 pm
by B.Y.O.B.
I did a test with the LuxCore2.1Benchmark scene, but I measured rendertime for 1000 samples.
Kernels and PhotonGI cache were pre-computed and cached.
Times are in min:sec format.

CPU: AMD Ryzen 7 2700x
GPU: Nvidia RTX 2080

v2.3 cpu: 9:36
v2.3 opencl: 2:54

v2.4 cpu: 8:53
v2.4 opencl: 3:24
v2.4 cuda: 2:54