v2.3 Vs v2.4 performance

Post by **epilectrolytics** » Fri May 01, 2020 6:08 pm

More testing with different scenes, all rendered without CPU, PGI or denoiser:

Ratios of amounts of samples per second:

Food scene............................Wallpaper scene:.................Benchmark scene:
2.4 CUDA/2.4 OCL=1.26......2.4 CUDA/2.4 OCL=1.25.....2.4 CUDA/2.4 OCL=1.23
2.3 OCL/2.4 OCL=1.85........2.3 OCL/2.4 OCL=1.02........2.3 OCL/2.4 OCL=1.32
2.2 OCL/2.4 OCL=2.22........2.2 OCL/2.4 OCL=1.71........2.2 OCL/2.4 OCL=2.16
2.2 OCL/2.3 OCL=1.20........2.2 OCL/2.3 OCL=1.67........2.2 OCL/2.3 OCL=1.63

In a light scene with simple materials like the Wallpaper scene there is nearly no speed difference between v2.3 OCL and v2.4 OCL.
In a heavy scene with more complex materials like the Food scene there is a huge difference.

V2.2 is always faster than v.2.3 and way faster than v2.4.

Post by **Sharlybg** » Fri May 01, 2020 6:39 pm

My test OCL VS CUDA : RTX 2060 Super

CUDA 1mn 46

OCL 1mn 49

Post by **epilectrolytics** » Fri May 01, 2020 7:27 pm

@Sharlybg:

A very striking comment on the current effects of Corona on the economy:
DHL is a german parcel service slowed down heavily (like a Porsche hitting the brakes) by the fact that with most shops closed down everyone orders their stuff via internet and they have not enough employees...

__________________________________________________________________

OMG I forgot to compare against v2.1!!

It renders even faster than v2.2

But only a little bit

.

.
Rendered with no PGI (didn't exist in v2.1).
Now I dare not to try v2.0 . . .

Post by **B.Y.O.B.** » Fri May 01, 2020 7:57 pm

epilectrolytics wrote: ↑Fri May 01, 2020 7:27 pm Now I dare not to try v2.0 . . .

Why not jump straight back to the first release of SLG and be amazed.
Features cost performance, some less, some more, but almost none are free.
However, I'm still interested in the reason for the big difference between v2.2 and v2.3.

Post by **Sharlybg** » Fri May 01, 2020 9:18 pm

Why not jump straight back to the first release of SLG and be amazed.

Yes i remenber this guy. SLG + HD 7970 GHz a king combo. Btw will be great to make such kind of test with major release so we can monitoring performance and maybe avoid to lost what was gain with so much work

Post by **Dade** » Fri May 01, 2020 10:15 pm

B.Y.O.B. wrote: ↑Fri May 01, 2020 7:57 pm However, I'm still interested in the reason for the big difference between v2.2 and v2.3.

This was conditional compilation in v2.2:

[LuxCore][1.577] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1.000000e-04f -D PARAM_RAY_EPSILON_MAX=1.000000e-01f -D PARAM_LIGHT_WORLD_RADIUS_SCALE=1.050000e+00f -D PARAM_ACCEL_MBVH -D PARAM_FILM_RADIANCE_GROUP_0 -D PARAM_FILM_RADIANCE_GROUP_COUNT=1 -D PARAM_FILM_CHANNELS_HAS_ALBEDO -D PARAM_FILM_CHANNELS_HAS_AVG_SHADING_NORMAL -D PARAM_FILM_CHANNELS_HAS_NOISE -D PARAM_ENABLE_TEX_CONST_FLOAT -D PARAM_ENABLE_TEX_CONST_FLOAT3 -D PARAM_ENABLE_TEX_IMAGEMAP -D PARAM_ENABLE_TEX_SCALE -D PARAM_ENABLE_TEX_MIX -D PARAM_ENABLE_TEX_SUBTRACT -D PARAM_ENABLE_TEX_BAND -D PARAM_ENABLE_TEX_NORMALMAP -D PARAM_ENABLE_TEX_FRESNELCOLOR -D PARAM_ENABLE_TEX_FRESNELCONST -D PARAM_ENABLE_MAT_MATTE -D PARAM_ENABLE_MAT_VELVET -D PARAM_ENABLE_MAT_ARCHGLASS -D PARAM_ENABLE_MAT_MIX -D PARAM_ENABLE_MAT_MATTETRANSLUCENT -D PARAM_ENABLE_MAT_GLOSSY2 -D PARAM_ENABLE_MAT_GLOSSY2_INDEX -D PARAM_ENABLE_MAT_METAL2 -D PARAM_ENABLE_MAT_METAL2_ANISOTROPIC -D PARAM_ENABLE_MAT_GLOSSYCOATING -D PARAM_ENABLE_MAT_GLOSSYCOATING_INDEX -D PARAM_HAS_PASSTHROUGH -D PARAM_CAMERA_TYPE=0 -D PARAM_HAS_SKYLIGHT2 -D PARAM_HAS_SUNLIGHT -D PARAM_HAS_ENVLIGHTS -D PARAM_HAS_IMAGEMAPS -D PARAM_IMAGEMAPS_PAGE_0 -D PARAM_IMAGEMAPS_COUNT=1 -D PARAM_HAS_IMAGEMAPS_BYTE_FORMAT -D PARAM_HAS_IMAGEMAPS_3xCHANNELS -D PARAM_HAS_IMAGEMAPS_WRAP_REPEAT -D PARAM_HAS_BUMPMAPS -D PARAM_HAS_VOLUMES -D SCENE_DEFAULT_VOLUME_INDEX=4294967295 -D PARAM_MAX_PATH_DEPTH=13 -D PARAM_MAX_PATH_DEPTH_DIFFUSE=7 -D PARAM_MAX_PATH_DEPTH_GLOSSY=7 -D PARAM_MAX_PATH_DEPTH_SPECULAR=12 -D PARAM_RR_DEPTH=3 -D PARAM_RR_CAP=5.000000e-01f -D PARAM_SQRT_VARIANCE_CLAMP_MAX_VALUE=1.000000e+01f -D PARAM_IMAGE_FILTER_TYPE=5 -D PARAM_IMAGE_FILTER_WIDTH_X=1.500000e+00f -D PARAM_IMAGE_FILTER_WIDTH_Y=1.500000e+00f -D PARAM_IMAGE_FILTER_PIXEL_WIDTH_X=1 -D PARAM_IMAGE_FILTER_PIXEL_WIDTH_Y=1 -D PARAM_SAMPLER_TYPE=2 -D PARAM_SAMPLER_SOBOL_STARTOFFSET=32 -D LUXCORE_NVIDIA_OPENCL
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Compiling kernels

It was removed in v2.3 (but materials/textures):

[LuxCore][1.335] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1.000000e-04f -D PARAM_RAY_EPSILON_MAX=1.000000e-01f -D PARAM_ENABLE_TEX_CONST_FLOAT -D PARAM_ENABLE_TEX_CONST_FLOAT3 -D PARAM_ENABLE_TEX_IMAGEMAP -D PARAM_ENABLE_TEX_SCALE -D PARAM_ENABLE_TEX_MIX -D PARAM_ENABLE_TEX_SUBTRACT -D PARAM_ENABLE_TEX_BAND -D PARAM_ENABLE_TEX_NORMALMAP -D PARAM_ENABLE_TEX_FRESNELCOLOR -D PARAM_ENABLE_TEX_FRESNELCONST -D PARAM_ENABLE_MAT_MATTE -D PARAM_ENABLE_MAT_VELVET -D PARAM_ENABLE_MAT_ARCHGLASS -D PARAM_ENABLE_MAT_MIX -D PARAM_ENABLE_MAT_MATTETRANSLUCENT -D PARAM_ENABLE_MAT_GLOSSY2 -D PARAM_ENABLE_MAT_GLOSSY2_INDEX -D PARAM_ENABLE_MAT_METAL2 -D PARAM_ENABLE_MAT_METAL2_ANISOTROPIC -D PARAM_ENABLE_MAT_GLOSSYCOATING -D PARAM_ENABLE_MAT_GLOSSYCOATING_INDEX -D LUXCORE_NVIDIA_OPENCL
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Compiling kernels

Than materials/textures interpreter was introduced in v2.4:

[LuxRays][1.338] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=0.0001f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXCORE_NVIDIA_OPENCL -D LUXRAYS_OPENCL_DEVICE -cl-fast-relaxed-math -cl-mad-enable
[LuxRays][1.338] [PathOCL kernel] Compiling kernels

Post by **acasta69** » Sat May 02, 2020 8:59 am

Repeated tests without PhotonGI and denoiser, same versions as my previous posts, + latest version (with Cache Friendly Samplers):

LuxCore2.3 OpenCL: 7906 rays/sec
LuxCore2.4alpha0 OpenCL, build 20200430.12: 5325 rays/sec
LuxCore2.4alpha0 CUDA, build 20200430.12: 5079 rays/sec
LuxCore2.4alpha0 OpenCL, Cache Friendly Samplers: 5489rays/sec
LuxCore2.4alpha0 CUDA, Cache Friendly Samplers: 5276/sec

Eliminating PhotonGI and denoising reduced the difference between OpenCL and CUDA in stats, even if the first still has an advantage (+4% instead of +11%).

The render.cfg was set up to always use the OpenCL device for imagepipeline. Forcing it to use CUDA cause a performance decrease, from 5276 to about 5050 samples/sec, and the LuxCoreUI gui is also much less responsive (UI loop time ~ 1400 msec instead of 185 ms).

Post by **FarbigeWelt** » Sat May 02, 2020 5:53 pm

Dade wrote: ↑Fri May 01, 2020 10:15 pm
This was conditional compilation in v2.2:

[LuxCore][1.577] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D
...
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Compiling kernels
It was removed in v2.3 (but materials/textures):

[LuxCore][1.335] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D
...
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Compiling kernels
Than materials/textures interpreter was introduced in v2.4:

[LuxRays][1.338] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=0.0001f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXCORE_NVIDIA_OPENCL -D LUXRAYS_OPENCL_DEVICE -cl-fast-relaxed-math -cl-mad-enable
[LuxRays][1.338] [PathOCL kernel] Compiling kernels

There is probably an improvement of LCR V2.4 performance possible
In my opinion you made a very good job lately for the pre-compilation of the openCL kernel.
Rendering previews in Viewport never made more sense than now.
Also, waiting for the first dozen of final render's samples was never less tedious than now with Lux Core Render 2.4 alpha.
Thank you very much for these improvements

What do you think of compiling openCL kernels on CPU in the background while rendering on GPU

Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering

Post by **neo2068** » Sat May 02, 2020 6:55 pm

FarbigeWelt wrote: ↑Sat May 02, 2020 5:53 pm What do you think of compiling openCL kernels on CPU in the background while rendering on GPU
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering

Actually, something like that is done in virtual machines. For fast startup and good runtime performance VMs use a combination of interpretation and binary translation of the often used code blocks. Is such a framework possible with openCL or cuda?

Post by **Dade** » Sun May 03, 2020 10:55 am

neo2068 wrote: ↑Sat May 02, 2020 6:55 pm
FarbigeWelt wrote: ↑Sat May 02, 2020 5:53 pm What do you think of compiling openCL kernels on CPU in the background while rendering on GPU
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering
Actually, something like that is done in virtual machines. For fast startup and good runtime performance VMs use a combination of interpretation and binary translation of the often used code blocks. Is such a framework possible with openCL or cuda?

It is possible but quite complicate. It is easier to just have a flag to use the generic, feature-complete kernel or the on-the-fly generated and compiled kernel. In one case, there is no kernel re-compilation, on the other you pay the cost of a kernel recompilation but you have a faster rendering.

Generic kernel could be used for view port rendering and test renderings while recompiled kernel could be used for final rendering.

Anyway, in both case, I have to (optionally) re-enable conditional compilations ... for God sake ....

About materials/textures: I'm not going to re-introduce dynamic generated code for recursive materials/textures but I could have a fast path for normal materials/textures and use the interpreter only for recursive materials/textures.

There is an important "wrong" factor in all the tests in this thread: compilation time is not part of the measured results. If you are doing a 5 mins long rendering, 2mins compilation time will kill the performances if accounted, if you are doing an 1 hour long rendering, it doesn't matter.

Short version: no solution is optimal in all cases, you may need both.

P.S. if it isn't clear, v2.2 removed conditional compilation, v2.3 removed dynamically generated code for materials/textures. They are 2 somewhat different topics even if the end result is to remove kernel re-compilation.

LuxCoreRender Forums

v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

Re: v2.3 Vs v2.4 performance

V2.4 performance, background kernel hard compilation possible?

Re: V2.4 performance, background kernel hard compilation possible?

Re: V2.4 performance, background kernel hard compilation possible?