v2.3 Vs v2.4 performance

Discussion related to the LuxCore functionality, implementations and API.
neo2068
Developer
Developer
Posts: 260
Joined: Tue Dec 05, 2017 6:06 pm
Location: Germany

v2.3 Vs v2.4 performance

Post by neo2068 »

Dade wrote: Wed Apr 29, 2020 11:18 am I should have fixed the OpenCL problem: it was a mess of alignment and size of a 64bit field (CPU, OpenCL and CUDA work in different ways :roll: ).
Yes, OpenCl compilation works now but I have another question. I downloaded the latest azure build and the OpenCL build is as fast/slow as my build. I see the expected speedup of cuda, too. Which is good on first sight but it is approx. 2x slower than the 2.3 build. What is the difference of the new framework compared to the old one in v2.3? Is this normal behaviour?

Food scene with LuxCoreUI v2.3:
v2.3.png
Food scene with LuxCoreUI v2.4alpha0 OpenCL build:
v2.4.png
Food scene with LuxCoreUI v2.4alpha0 CUDA build OCL devices:
v2.4 OCL.png
Food scene with LuxCoreUI v2.4alpha0 CUDA build CUDA devices:
v2.4 CUDA.png
i7 5820K, 32 GB RAM, NVIDIA Geforce RTX 2080 SUPER + GTX 1080, Windows 10 64bit, Blender 2.83.5
Support LuxCoreRender project with salts and bounties
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: Windows Build FAILED

Post by Dade »

neo2068 wrote: Wed Apr 29, 2020 8:59 pm
Dade wrote: Wed Apr 29, 2020 11:18 am I should have fixed the OpenCL problem: it was a mess of alignment and size of a 64bit field (CPU, OpenCL and CUDA work in different ways :roll: ).
Yes, OpenCl compilation works now but I have another question. I downloaded the latest azure build and the OpenCL build is as fast/slow as my build. I see the expected speedup of cuda, too. Which is good on first sight but it is approx. 2x slower than the 2.3 build. What is the difference of the new framework compared to the old one in v2.3? Is this normal behaviour?
No and, as far as I know, it happens only to you, to start, try check if it is the scene, your version looks like a very old one and it is likely to use the BCD denoiser. BCD is a peculiar beast that require a lot of (GPU) ram. Try some other scene (and be sure to disable BCD, it is enabled in most old .cfg).
Support LuxCoreRender project with salts and bounties
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: Windows Build FAILED

Post by Dade »

Check your screenshots: the GUI loop time is 18ms in v2.3 and 250ms in v2.4 there is something very wrong going on :?:
Support LuxCoreRender project with salts and bounties
neo2068
Developer
Developer
Posts: 260
Joined: Tue Dec 05, 2017 6:06 pm
Location: Germany

Re: Windows Build FAILED

Post by neo2068 »

Dade wrote: Thu Apr 30, 2020 1:03 am No and, as far as I know, it happens only to you, to start, try check if it is the scene, your version looks like a very old one and it is likely to use the BCD denoiser. BCD is a peculiar beast that require a lot of (GPU) ram. Try some other scene (and be sure to disable BCD, it is enabled in most old .cfg).
Yes, the scene was old. But with new scenes (i.e. food and danish mood) and disabled denoiser I get similar performance values. At least the GUI loop time is down to ~20 ms.
If no one else has this slowdown problems I will investigate it on my side. It has no high priority. I thought that you had an spontaneous idea what could be the root.
i7 5820K, 32 GB RAM, NVIDIA Geforce RTX 2080 SUPER + GTX 1080, Windows 10 64bit, Blender 2.83.5
Support LuxCoreRender project with salts and bounties
epilectrolytics
Donor
Donor
Posts: 790
Joined: Thu Oct 04, 2018 6:06 am

Re: Windows Build FAILED

Post by epilectrolytics »

neo2068 wrote: Thu Apr 30, 2020 5:32 am
If no one else has this slowdown problems I will investigate it on my side.
I can confirm neo's results.
With the old Food scene it looks very similar but even in the Cornell scene it is visible:
Old OCL speed is faster then new OCL but slower than new CUDA.
foodsc.jpg
cornell.jpg
There clearly is a problem, and not a small one, given a speed loss of 40% in the Food scene.
I'm seeing the same GUI loop time increase and also the responsiveness of LuxCoreUI2.4alpha compared to version 2.3 is very bad when trying to open and resize editor windows.
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: v2.3 Vs v2.4 performance

Post by Dade »

First some uniform rule for testing or it is a mess:

1) use an automatic Azure build from the very latest source or v2.3 release binaries;

2) use LuxCore2.1Benchmark (https://github.com/LuxCoreRender/LuxCor ... xCoreScene) with LuxCoreUI with BCD disabled;

3) run the test 2 times (so the second doesn't include any kernel compilation);

4) the the CPU threads to 0 with a command line option "-D opencl.native.threads.count 0" (we are not interested to measure CPU performances);

4) given above rules, we can use the rays/sec statistic seen after pressing the "j" key (because it is stable after few seconds of rendering while samples/sec require a long run).

Given the above rules, my results are:

- v2.3 with OpenCL 1 x RTX2070 => ~20,200 rays/sec
- v2.4 with OpenCL 1 x RTX2070 => ~17,500 rays/sec
- v2.4 with CUDA 1 x RTX2070 => ~19,100 rays/sec

This should be explained by "v2.3 compiled materials/textures" Vs "v2.4 interpreted materials/textures". The current materials/textures interpreter can be further accelerated in CUDA by using function pointers (not allowed in OpenCL), etc.
Support LuxCoreRender project with salts and bounties
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: v2.3 Vs v2.4 performance

Post by Dade »

Dade wrote: Thu Apr 30, 2020 12:57 pm This should be explained by "v2.3 compiled materials/textures" Vs "v2.4 interpreted materials/textures". The current materials/textures interpreter can be further accelerated in CUDA by using function pointers (not allowed in OpenCL), etc.
Indeed, more the scene is materials/textures heavy (i.e. complex node trees) and more compiled is faster than interpreted (and vice versa).
Support LuxCoreRender project with salts and bounties
neo2068
Developer
Developer
Posts: 260
Joined: Tue Dec 05, 2017 6:06 pm
Location: Germany

Re: v2.3 Vs v2.4 performance

Post by neo2068 »

@Dade: Thank you for the detailed explanation. I have thought, that it has to be something like that. I hadn't looked into your rework of the kernel compilation code. So, I didn't know how the permanent kernel was achieved. For me it is ok and it make sense. You are really doing a great job.
i7 5820K, 32 GB RAM, NVIDIA Geforce RTX 2080 SUPER + GTX 1080, Windows 10 64bit, Blender 2.83.5
Support LuxCoreRender project with salts and bounties
User avatar
Dade
Developer
Developer
Posts: 5672
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: v2.3 Vs v2.4 performance

Post by Dade »

neo2068 wrote: Thu Apr 30, 2020 2:07 pm @Dade: Thank you for the detailed explanation. I have thought, that it has to be something like that. I hadn't looked into your rework of the kernel compilation code. So, I didn't know how the permanent kernel was achieved. For me it is ok and it make sense. You are really doing a great job.
I assume I can regain some performance (but only in CUDA): for the textures alone, there is a "switch(textureType)" with 60+ cases, in CUDA it can be done with a jump table with function pointers.
Support LuxCoreRender project with salts and bounties
User avatar
B.Y.O.B.
Developer
Developer
Posts: 4146
Joined: Mon Dec 04, 2017 10:08 pm
Location: Germany
Contact:

Re: v2.3 Vs v2.4 performance

Post by B.Y.O.B. »

I did a test with the LuxCore2.1Benchmark scene, but I measured rendertime for 1000 samples.
Kernels and PhotonGI cache were pre-computed and cached.
Times are in min:sec format.

CPU: AMD Ryzen 7 2700x
GPU: Nvidia RTX 2080

v2.3 cpu: 9:36
v2.3 opencl: 2:54

v2.4 cpu: 8:53
v2.4 opencl: 3:24
v2.4 cuda: 2:54
Post Reply