v2.3 Vs v2.4 performance

Discussion related to the LuxCore functionality, implementations and API.
epilectrolytics
Donor
Posts: 703
Joined: Thu Oct 04, 2018 6:06 am

Re: v2.3 Vs v2.4 performance

Post by epilectrolytics » Fri May 01, 2020 6:08 pm

More testing with different scenes, all rendered without CPU, PGI or denoiser:
foodx4.jpg
wallp4.jpg
Ratios of amounts of samples per second:

Food scene............................Wallpaper scene:.................Benchmark scene:
2.4 CUDA/2.4 OCL=1.26......2.4 CUDA/2.4 OCL=1.25.....2.4 CUDA/2.4 OCL=1.23
2.3 OCL/2.4 OCL=1.85........2.3 OCL/2.4 OCL=1.02........2.3 OCL/2.4 OCL=1.32
2.2 OCL/2.4 OCL=2.22........2.2 OCL/2.4 OCL=1.71........2.2 OCL/2.4 OCL=2.16
2.2 OCL/2.3 OCL=1.20........2.2 OCL/2.3 OCL=1.67........2.2 OCL/2.3 OCL=1.63

In a light scene with simple materials like the Wallpaper scene there is nearly no speed difference between v2.3 OCL and v2.4 OCL.
In a heavy scene with more complex materials like the Food scene there is a huge difference.

V2.2 is always faster than v.2.3 and way faster than v2.4.

User avatar
Sharlybg
Donor
Posts: 2155
Joined: Mon Dec 04, 2017 10:11 pm
Location: Ivory Coast

Re: v2.3 Vs v2.4 performance

Post by Sharlybg » Fri May 01, 2020 6:39 pm

My test OCL VS CUDA : RTX 2060 Super

CUDA 1mn 46
cuda 1_46.jpg
OCL 1mn 49
ocl 1_49.jpg
Support LuxCoreRender project with salts and bounties

Portfolio : https://www.behance.net/DRAVIA

epilectrolytics
Donor
Posts: 703
Joined: Thu Oct 04, 2018 6:06 am

Re: v2.3 Vs v2.4 performance

Post by epilectrolytics » Fri May 01, 2020 7:27 pm

@Sharlybg:

A very striking comment on the current effects of Corona on the economy:
DHL is a german parcel service slowed down heavily (like a Porsche hitting the brakes) by the fact that with most shops closed down everyone orders their stuff via internet and they have not enough employees...


__________________________________________________________________

OMG I forgot to compare against v2.1!!

It renders even faster than v2.2 :shock: :o

But only a little bit :D :lol:
.
bench2.jpg
.
Rendered with no PGI (didn't exist in v2.1).
Now I dare not to try v2.0 . . .

User avatar
B.Y.O.B.
Developer
Posts: 3681
Joined: Mon Dec 04, 2017 10:08 pm
Location: Germany
Contact:

Re: v2.3 Vs v2.4 performance

Post by B.Y.O.B. » Fri May 01, 2020 7:57 pm

epilectrolytics wrote:
Fri May 01, 2020 7:27 pm
Now I dare not to try v2.0 . . .
Why not jump straight back to the first release of SLG and be amazed.
Features cost performance, some less, some more, but almost none are free.
However, I'm still interested in the reason for the big difference between v2.2 and v2.3.

User avatar
Sharlybg
Donor
Posts: 2155
Joined: Mon Dec 04, 2017 10:11 pm
Location: Ivory Coast

Re: v2.3 Vs v2.4 performance

Post by Sharlybg » Fri May 01, 2020 9:18 pm

Why not jump straight back to the first release of SLG and be amazed.
Yes i remenber this guy. SLG + HD 7970 GHz a king combo. Btw will be great to make such kind of test with major release so we can monitoring performance and maybe avoid to lost what was gain with so much work ;)
Support LuxCoreRender project with salts and bounties

Portfolio : https://www.behance.net/DRAVIA

User avatar
Dade
Developer
Posts: 4509
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: v2.3 Vs v2.4 performance

Post by Dade » Fri May 01, 2020 10:15 pm

B.Y.O.B. wrote:
Fri May 01, 2020 7:57 pm
However, I'm still interested in the reason for the big difference between v2.2 and v2.3.
This was conditional compilation in v2.2:
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1.000000e-04f -D PARAM_RAY_EPSILON_MAX=1.000000e-01f -D PARAM_LIGHT_WORLD_RADIUS_SCALE=1.050000e+00f -D PARAM_ACCEL_MBVH -D PARAM_FILM_RADIANCE_GROUP_0 -D PARAM_FILM_RADIANCE_GROUP_COUNT=1 -D PARAM_FILM_CHANNELS_HAS_ALBEDO -D PARAM_FILM_CHANNELS_HAS_AVG_SHADING_NORMAL -D PARAM_FILM_CHANNELS_HAS_NOISE -D PARAM_ENABLE_TEX_CONST_FLOAT -D PARAM_ENABLE_TEX_CONST_FLOAT3 -D PARAM_ENABLE_TEX_IMAGEMAP -D PARAM_ENABLE_TEX_SCALE -D PARAM_ENABLE_TEX_MIX -D PARAM_ENABLE_TEX_SUBTRACT -D PARAM_ENABLE_TEX_BAND -D PARAM_ENABLE_TEX_NORMALMAP -D PARAM_ENABLE_TEX_FRESNELCOLOR -D PARAM_ENABLE_TEX_FRESNELCONST -D PARAM_ENABLE_MAT_MATTE -D PARAM_ENABLE_MAT_VELVET -D PARAM_ENABLE_MAT_ARCHGLASS -D PARAM_ENABLE_MAT_MIX -D PARAM_ENABLE_MAT_MATTETRANSLUCENT -D PARAM_ENABLE_MAT_GLOSSY2 -D PARAM_ENABLE_MAT_GLOSSY2_INDEX -D PARAM_ENABLE_MAT_METAL2 -D PARAM_ENABLE_MAT_METAL2_ANISOTROPIC -D PARAM_ENABLE_MAT_GLOSSYCOATING -D PARAM_ENABLE_MAT_GLOSSYCOATING_INDEX -D PARAM_HAS_PASSTHROUGH -D PARAM_CAMERA_TYPE=0 -D PARAM_HAS_SKYLIGHT2 -D PARAM_HAS_SUNLIGHT -D PARAM_HAS_ENVLIGHTS -D PARAM_HAS_IMAGEMAPS -D PARAM_IMAGEMAPS_PAGE_0 -D PARAM_IMAGEMAPS_COUNT=1 -D PARAM_HAS_IMAGEMAPS_BYTE_FORMAT -D PARAM_HAS_IMAGEMAPS_3xCHANNELS -D PARAM_HAS_IMAGEMAPS_WRAP_REPEAT -D PARAM_HAS_BUMPMAPS -D PARAM_HAS_VOLUMES -D SCENE_DEFAULT_VOLUME_INDEX=4294967295 -D PARAM_MAX_PATH_DEPTH=13 -D PARAM_MAX_PATH_DEPTH_DIFFUSE=7 -D PARAM_MAX_PATH_DEPTH_GLOSSY=7 -D PARAM_MAX_PATH_DEPTH_SPECULAR=12 -D PARAM_RR_DEPTH=3 -D PARAM_RR_CAP=5.000000e-01f -D PARAM_SQRT_VARIANCE_CLAMP_MAX_VALUE=1.000000e+01f -D PARAM_IMAGE_FILTER_TYPE=5 -D PARAM_IMAGE_FILTER_WIDTH_X=1.500000e+00f -D PARAM_IMAGE_FILTER_WIDTH_Y=1.500000e+00f -D PARAM_IMAGE_FILTER_PIXEL_WIDTH_X=1 -D PARAM_IMAGE_FILTER_PIXEL_WIDTH_Y=1 -D PARAM_SAMPLER_TYPE=2 -D PARAM_SAMPLER_SOBOL_STARTOFFSET=32 -D LUXCORE_NVIDIA_OPENCL
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Compiling kernels
It was removed in v2.3 (but materials/textures):
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1.000000e-04f -D PARAM_RAY_EPSILON_MAX=1.000000e-01f -D PARAM_ENABLE_TEX_CONST_FLOAT -D PARAM_ENABLE_TEX_CONST_FLOAT3 -D PARAM_ENABLE_TEX_IMAGEMAP -D PARAM_ENABLE_TEX_SCALE -D PARAM_ENABLE_TEX_MIX -D PARAM_ENABLE_TEX_SUBTRACT -D PARAM_ENABLE_TEX_BAND -D PARAM_ENABLE_TEX_NORMALMAP -D PARAM_ENABLE_TEX_FRESNELCOLOR -D PARAM_ENABLE_TEX_FRESNELCONST -D PARAM_ENABLE_MAT_MATTE -D PARAM_ENABLE_MAT_VELVET -D PARAM_ENABLE_MAT_ARCHGLASS -D PARAM_ENABLE_MAT_MIX -D PARAM_ENABLE_MAT_MATTETRANSLUCENT -D PARAM_ENABLE_MAT_GLOSSY2 -D PARAM_ENABLE_MAT_GLOSSY2_INDEX -D PARAM_ENABLE_MAT_METAL2 -D PARAM_ENABLE_MAT_METAL2_ANISOTROPIC -D PARAM_ENABLE_MAT_GLOSSYCOATING -D PARAM_ENABLE_MAT_GLOSSYCOATING_INDEX -D LUXCORE_NVIDIA_OPENCL
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Compiling kernels
Than materials/textures interpreter was introduced in v2.4:
[LuxRays][1.338] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=0.0001f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXCORE_NVIDIA_OPENCL -D LUXRAYS_OPENCL_DEVICE -cl-fast-relaxed-math -cl-mad-enable
[LuxRays][1.338] [PathOCL kernel] Compiling kernels
Support LuxCoreRender project with salts and bounties

acasta69
Developer
Posts: 344
Joined: Tue Jan 09, 2018 3:45 pm

Re: v2.3 Vs v2.4 performance

Post by acasta69 » Sat May 02, 2020 8:59 am

Repeated tests without PhotonGI and denoiser, same versions as my previous posts, + latest version (with Cache Friendly Samplers):

LuxCore2.3 OpenCL: 7906 rays/sec
LuxCore2.4alpha0 OpenCL, build 20200430.12: 5325 rays/sec
LuxCore2.4alpha0 CUDA, build 20200430.12: 5079 rays/sec
LuxCore2.4alpha0 OpenCL, Cache Friendly Samplers: 5489rays/sec
LuxCore2.4alpha0 CUDA, Cache Friendly Samplers: 5276/sec

Eliminating PhotonGI and denoising reduced the difference between OpenCL and CUDA in stats, even if the first still has an advantage (+4% instead of +11%).

The render.cfg was set up to always use the OpenCL device for imagepipeline. Forcing it to use CUDA cause a performance decrease, from 5276 to about 5050 samples/sec, and the LuxCoreUI gui is also much less responsive (UI loop time ~ 1400 msec instead of 185 ms).
Support LuxCoreRender project with salts and bounties

Windows 10 64 bits, i7-4770 3.4 GHz, RAM 16 GB, GTX 970 4GB v445.87

User avatar
FarbigeWelt
Donor
Posts: 946
Joined: Sun Jul 01, 2018 12:07 pm
Location: Switzerland
Contact:

V2.4 performance, background kernel hard compilation possible?

Post by FarbigeWelt » Sat May 02, 2020 5:53 pm

Dade wrote:
Fri May 01, 2020 10:15 pm

This was conditional compilation in v2.2:
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D
...
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Compiling kernels
It was removed in v2.3 (but materials/textures):
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D
...
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Compiling kernels
Than materials/textures interpreter was introduced in v2.4:
[LuxRays][1.338] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=0.0001f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXCORE_NVIDIA_OPENCL -D LUXRAYS_OPENCL_DEVICE -cl-fast-relaxed-math -cl-mad-enable
[LuxRays][1.338] [PathOCL kernel] Compiling kernels
:arrow: There is probably an improvement of LCR V2.4 performance possible
In my opinion you made a very good job lately for the pre-compilation of the openCL kernel.
Rendering previews in Viewport never made more sense than now.
Also, waiting for the first dozen of final render's samples was never less tedious than now with Lux Core Render 2.4 alpha.
Thank you very much for these improvements :!: :D

What do you think of compiling openCL kernels on CPU in the background while rendering on GPU :?:
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering :?:
Light and Word designing Creator - www.farbigewelt.ch - aka quantenkristall || #luxcorerender
160.8 | 42.8 (10.7) Gfp | Windows 10 Pro, intel i7 4770K@3.5, 32 GB
2 AMD Radeon RX 5700 XT, 8 GB || Gfp = SFFT Gflops

neo2068
Developer
Posts: 217
Joined: Tue Dec 05, 2017 6:06 pm
Location: Germany

Re: V2.4 performance, background kernel hard compilation possible?

Post by neo2068 » Sat May 02, 2020 6:55 pm

FarbigeWelt wrote:
Sat May 02, 2020 5:53 pm
What do you think of compiling openCL kernels on CPU in the background while rendering on GPU :?:
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering :?:
Actually, something like that is done in virtual machines. For fast startup and good runtime performance VMs use a combination of interpretation and binary translation of the often used code blocks. Is such a framework possible with openCL or cuda?
i7 5820K, 32 GB RAM, NVIDIA Geforce GTX 1080 + GTX 780, Windows 10 64bit, Blender 2.83.3
Support LuxCoreRender project with salts and bounties

User avatar
Dade
Developer
Posts: 4509
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: V2.4 performance, background kernel hard compilation possible?

Post by Dade » Sun May 03, 2020 10:55 am

neo2068 wrote:
Sat May 02, 2020 6:55 pm
FarbigeWelt wrote:
Sat May 02, 2020 5:53 pm
What do you think of compiling openCL kernels on CPU in the background while rendering on GPU :?:
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering :?:
Actually, something like that is done in virtual machines. For fast startup and good runtime performance VMs use a combination of interpretation and binary translation of the often used code blocks. Is such a framework possible with openCL or cuda?
It is possible but quite complicate. It is easier to just have a flag to use the generic, feature-complete kernel or the on-the-fly generated and compiled kernel. In one case, there is no kernel re-compilation, on the other you pay the cost of a kernel recompilation but you have a faster rendering.

Generic kernel could be used for view port rendering and test renderings while recompiled kernel could be used for final rendering.

Anyway, in both case, I have to (optionally) re-enable conditional compilations ... for God sake ....

About materials/textures: I'm not going to re-introduce dynamic generated code for recursive materials/textures but I could have a fast path for normal materials/textures and use the interpreter only for recursive materials/textures.

There is an important "wrong" factor in all the tests in this thread: compilation time is not part of the measured results. If you are doing a 5 mins long rendering, 2mins compilation time will kill the performances if accounted, if you are doing an 1 hour long rendering, it doesn't matter.

Short version: no solution is optimal in all cases, you may need both.

P.S. if it isn't clear, v2.2 removed conditional compilation, v2.3 removed dynamically generated code for materials/textures. They are 2 somewhat different topics even if the end result is to remove kernel re-compilation.
Support LuxCoreRender project with salts and bounties

Post Reply