v2.3 Vs v2.4 performance
-
- Donor
- Posts: 790
- Joined: Thu Oct 04, 2018 6:06 am
Re: v2.3 Vs v2.4 performance
More testing with different scenes, all rendered without CPU, PGI or denoiser:
Food scene............................Wallpaper scene:.................Benchmark scene:
2.4 CUDA/2.4 OCL=1.26......2.4 CUDA/2.4 OCL=1.25.....2.4 CUDA/2.4 OCL=1.23
2.3 OCL/2.4 OCL=1.85........2.3 OCL/2.4 OCL=1.02........2.3 OCL/2.4 OCL=1.32
2.2 OCL/2.4 OCL=2.22........2.2 OCL/2.4 OCL=1.71........2.2 OCL/2.4 OCL=2.16
2.2 OCL/2.3 OCL=1.20........2.2 OCL/2.3 OCL=1.67........2.2 OCL/2.3 OCL=1.63
In a light scene with simple materials like the Wallpaper scene there is nearly no speed difference between v2.3 OCL and v2.4 OCL.
In a heavy scene with more complex materials like the Food scene there is a huge difference.
V2.2 is always faster than v.2.3 and way faster than v2.4.
Ratios of amounts of samples per second:Food scene............................Wallpaper scene:.................Benchmark scene:
2.4 CUDA/2.4 OCL=1.26......2.4 CUDA/2.4 OCL=1.25.....2.4 CUDA/2.4 OCL=1.23
2.3 OCL/2.4 OCL=1.85........2.3 OCL/2.4 OCL=1.02........2.3 OCL/2.4 OCL=1.32
2.2 OCL/2.4 OCL=2.22........2.2 OCL/2.4 OCL=1.71........2.2 OCL/2.4 OCL=2.16
2.2 OCL/2.3 OCL=1.20........2.2 OCL/2.3 OCL=1.67........2.2 OCL/2.3 OCL=1.63
In a light scene with simple materials like the Wallpaper scene there is nearly no speed difference between v2.3 OCL and v2.4 OCL.
In a heavy scene with more complex materials like the Food scene there is a huge difference.
V2.2 is always faster than v.2.3 and way faster than v2.4.
Re: v2.3 Vs v2.4 performance
My test OCL VS CUDA : RTX 2060 Super
CUDA 1mn 46
OCL 1mn 49
CUDA 1mn 46
OCL 1mn 49
-
- Donor
- Posts: 790
- Joined: Thu Oct 04, 2018 6:06 am
Re: v2.3 Vs v2.4 performance
@Sharlybg:
A very striking comment on the current effects of Corona on the economy:
DHL is a german parcel service slowed down heavily (like a Porsche hitting the brakes) by the fact that with most shops closed down everyone orders their stuff via internet and they have not enough employees...
__________________________________________________________________
OMG I forgot to compare against v2.1!!
It renders even faster than v2.2
But only a little bit
. .
Rendered with no PGI (didn't exist in v2.1).
Now I dare not to try v2.0 . . .
A very striking comment on the current effects of Corona on the economy:
DHL is a german parcel service slowed down heavily (like a Porsche hitting the brakes) by the fact that with most shops closed down everyone orders their stuff via internet and they have not enough employees...
__________________________________________________________________
OMG I forgot to compare against v2.1!!
It renders even faster than v2.2
But only a little bit
. .
Rendered with no PGI (didn't exist in v2.1).
Now I dare not to try v2.0 . . .
Re: v2.3 Vs v2.4 performance
Why not jump straight back to the first release of SLG and be amazed.
Features cost performance, some less, some more, but almost none are free.
However, I'm still interested in the reason for the big difference between v2.2 and v2.3.
Re: v2.3 Vs v2.4 performance
Yes i remenber this guy. SLG + HD 7970 GHz a king combo. Btw will be great to make such kind of test with major release so we can monitoring performance and maybe avoid to lost what was gain with so much workWhy not jump straight back to the first release of SLG and be amazed.
Re: v2.3 Vs v2.4 performance
This was conditional compilation in v2.2:
It was removed in v2.3 (but materials/textures):[LuxCore][1.577] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1.000000e-04f -D PARAM_RAY_EPSILON_MAX=1.000000e-01f -D PARAM_LIGHT_WORLD_RADIUS_SCALE=1.050000e+00f -D PARAM_ACCEL_MBVH -D PARAM_FILM_RADIANCE_GROUP_0 -D PARAM_FILM_RADIANCE_GROUP_COUNT=1 -D PARAM_FILM_CHANNELS_HAS_ALBEDO -D PARAM_FILM_CHANNELS_HAS_AVG_SHADING_NORMAL -D PARAM_FILM_CHANNELS_HAS_NOISE -D PARAM_ENABLE_TEX_CONST_FLOAT -D PARAM_ENABLE_TEX_CONST_FLOAT3 -D PARAM_ENABLE_TEX_IMAGEMAP -D PARAM_ENABLE_TEX_SCALE -D PARAM_ENABLE_TEX_MIX -D PARAM_ENABLE_TEX_SUBTRACT -D PARAM_ENABLE_TEX_BAND -D PARAM_ENABLE_TEX_NORMALMAP -D PARAM_ENABLE_TEX_FRESNELCOLOR -D PARAM_ENABLE_TEX_FRESNELCONST -D PARAM_ENABLE_MAT_MATTE -D PARAM_ENABLE_MAT_VELVET -D PARAM_ENABLE_MAT_ARCHGLASS -D PARAM_ENABLE_MAT_MIX -D PARAM_ENABLE_MAT_MATTETRANSLUCENT -D PARAM_ENABLE_MAT_GLOSSY2 -D PARAM_ENABLE_MAT_GLOSSY2_INDEX -D PARAM_ENABLE_MAT_METAL2 -D PARAM_ENABLE_MAT_METAL2_ANISOTROPIC -D PARAM_ENABLE_MAT_GLOSSYCOATING -D PARAM_ENABLE_MAT_GLOSSYCOATING_INDEX -D PARAM_HAS_PASSTHROUGH -D PARAM_CAMERA_TYPE=0 -D PARAM_HAS_SKYLIGHT2 -D PARAM_HAS_SUNLIGHT -D PARAM_HAS_ENVLIGHTS -D PARAM_HAS_IMAGEMAPS -D PARAM_IMAGEMAPS_PAGE_0 -D PARAM_IMAGEMAPS_COUNT=1 -D PARAM_HAS_IMAGEMAPS_BYTE_FORMAT -D PARAM_HAS_IMAGEMAPS_3xCHANNELS -D PARAM_HAS_IMAGEMAPS_WRAP_REPEAT -D PARAM_HAS_BUMPMAPS -D PARAM_HAS_VOLUMES -D SCENE_DEFAULT_VOLUME_INDEX=4294967295 -D PARAM_MAX_PATH_DEPTH=13 -D PARAM_MAX_PATH_DEPTH_DIFFUSE=7 -D PARAM_MAX_PATH_DEPTH_GLOSSY=7 -D PARAM_MAX_PATH_DEPTH_SPECULAR=12 -D PARAM_RR_DEPTH=3 -D PARAM_RR_CAP=5.000000e-01f -D PARAM_SQRT_VARIANCE_CLAMP_MAX_VALUE=1.000000e+01f -D PARAM_IMAGE_FILTER_TYPE=5 -D PARAM_IMAGE_FILTER_WIDTH_X=1.500000e+00f -D PARAM_IMAGE_FILTER_WIDTH_Y=1.500000e+00f -D PARAM_IMAGE_FILTER_PIXEL_WIDTH_X=1 -D PARAM_IMAGE_FILTER_PIXEL_WIDTH_Y=1 -D PARAM_SAMPLER_TYPE=2 -D PARAM_SAMPLER_SOBOL_STARTOFFSET=32 -D LUXCORE_NVIDIA_OPENCL
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Compiling kernels
Than materials/textures interpreter was introduced in v2.4:[LuxCore][1.335] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1.000000e-04f -D PARAM_RAY_EPSILON_MAX=1.000000e-01f -D PARAM_ENABLE_TEX_CONST_FLOAT -D PARAM_ENABLE_TEX_CONST_FLOAT3 -D PARAM_ENABLE_TEX_IMAGEMAP -D PARAM_ENABLE_TEX_SCALE -D PARAM_ENABLE_TEX_MIX -D PARAM_ENABLE_TEX_SUBTRACT -D PARAM_ENABLE_TEX_BAND -D PARAM_ENABLE_TEX_NORMALMAP -D PARAM_ENABLE_TEX_FRESNELCOLOR -D PARAM_ENABLE_TEX_FRESNELCONST -D PARAM_ENABLE_MAT_MATTE -D PARAM_ENABLE_MAT_VELVET -D PARAM_ENABLE_MAT_ARCHGLASS -D PARAM_ENABLE_MAT_MIX -D PARAM_ENABLE_MAT_MATTETRANSLUCENT -D PARAM_ENABLE_MAT_GLOSSY2 -D PARAM_ENABLE_MAT_GLOSSY2_INDEX -D PARAM_ENABLE_MAT_METAL2 -D PARAM_ENABLE_MAT_METAL2_ANISOTROPIC -D PARAM_ENABLE_MAT_GLOSSYCOATING -D PARAM_ENABLE_MAT_GLOSSYCOATING_INDEX -D LUXCORE_NVIDIA_OPENCL
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Compiling kernels
[LuxRays][1.338] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=0.0001f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXCORE_NVIDIA_OPENCL -D LUXRAYS_OPENCL_DEVICE -cl-fast-relaxed-math -cl-mad-enable
[LuxRays][1.338] [PathOCL kernel] Compiling kernels
Re: v2.3 Vs v2.4 performance
Repeated tests without PhotonGI and denoiser, same versions as my previous posts, + latest version (with Cache Friendly Samplers):
LuxCore2.3 OpenCL: 7906 rays/sec
LuxCore2.4alpha0 OpenCL, build 20200430.12: 5325 rays/sec
LuxCore2.4alpha0 CUDA, build 20200430.12: 5079 rays/sec
LuxCore2.4alpha0 OpenCL, Cache Friendly Samplers: 5489rays/sec
LuxCore2.4alpha0 CUDA, Cache Friendly Samplers: 5276/sec
Eliminating PhotonGI and denoising reduced the difference between OpenCL and CUDA in stats, even if the first still has an advantage (+4% instead of +11%).
The render.cfg was set up to always use the OpenCL device for imagepipeline. Forcing it to use CUDA cause a performance decrease, from 5276 to about 5050 samples/sec, and the LuxCoreUI gui is also much less responsive (UI loop time ~ 1400 msec instead of 185 ms).
LuxCore2.3 OpenCL: 7906 rays/sec
LuxCore2.4alpha0 OpenCL, build 20200430.12: 5325 rays/sec
LuxCore2.4alpha0 CUDA, build 20200430.12: 5079 rays/sec
LuxCore2.4alpha0 OpenCL, Cache Friendly Samplers: 5489rays/sec
LuxCore2.4alpha0 CUDA, Cache Friendly Samplers: 5276/sec
Eliminating PhotonGI and denoising reduced the difference between OpenCL and CUDA in stats, even if the first still has an advantage (+4% instead of +11%).
The render.cfg was set up to always use the OpenCL device for imagepipeline. Forcing it to use CUDA cause a performance decrease, from 5276 to about 5050 samples/sec, and the LuxCoreUI gui is also much less responsive (UI loop time ~ 1400 msec instead of 185 ms).
- FarbigeWelt
- Donor
- Posts: 1046
- Joined: Sun Jul 01, 2018 12:07 pm
- Location: Switzerland
- Contact:
V2.4 performance, background kernel hard compilation possible?
There is probably an improvement of LCR V2.4 performance possibleDade wrote: ↑Fri May 01, 2020 10:15 pm
This was conditional compilation in v2.2:
It was removed in v2.3 (but materials/textures):[LuxCore][1.577] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D
...
[LuxCore][1.577] [PathOCLBaseRenderThread::0] Compiling kernels
Than materials/textures interpreter was introduced in v2.4:[LuxCore][1.335] [PathOCLBaseRenderThread::0] Defined symbols: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D
...
[LuxCore][1.335] [PathOCLBaseRenderThread::0] Compiling kernels
[LuxRays][1.338] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=0.0001f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXCORE_NVIDIA_OPENCL -D LUXRAYS_OPENCL_DEVICE -cl-fast-relaxed-math -cl-mad-enable
[LuxRays][1.338] [PathOCL kernel] Compiling kernels
In my opinion you made a very good job lately for the pre-compilation of the openCL kernel.
Rendering previews in Viewport never made more sense than now.
Also, waiting for the first dozen of final render's samples was never less tedious than now with Lux Core Render 2.4 alpha.
Thank you very much for these improvements
What do you think of compiling openCL kernels on CPU in the background while rendering on GPU
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering
Light and Word designing Creator - www.farbigewelt.ch - aka quantenkristall || #luxcorerender
MacBook Air with M1
MacBook Air with M1
Re: V2.4 performance, background kernel hard compilation possible?
Actually, something like that is done in virtual machines. For fast startup and good runtime performance VMs use a combination of interpretation and binary translation of the often used code blocks. Is such a framework possible with openCL or cuda?FarbigeWelt wrote: ↑Sat May 02, 2020 5:53 pm What do you think of compiling openCL kernels on CPU in the background while rendering on GPU
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering
Re: V2.4 performance, background kernel hard compilation possible?
It is possible but quite complicate. It is easier to just have a flag to use the generic, feature-complete kernel or the on-the-fly generated and compiled kernel. In one case, there is no kernel re-compilation, on the other you pay the cost of a kernel recompilation but you have a faster rendering.neo2068 wrote: ↑Sat May 02, 2020 6:55 pmActually, something like that is done in virtual machines. For fast startup and good runtime performance VMs use a combination of interpretation and binary translation of the often used code blocks. Is such a framework possible with openCL or cuda?FarbigeWelt wrote: ↑Sat May 02, 2020 5:53 pm What do you think of compiling openCL kernels on CPU in the background while rendering on GPU
Compiling especially the slower openCL parts, slower because interpreted since v2.4, what means slower than previous LCR versions which compiled openCL kernels fully.
Is switching from interpreted to compiled kernels possible between samples while rendering
Generic kernel could be used for view port rendering and test renderings while recompiled kernel could be used for final rendering.
Anyway, in both case, I have to (optionally) re-enable conditional compilations ... for God sake ....
About materials/textures: I'm not going to re-introduce dynamic generated code for recursive materials/textures but I could have a fast path for normal materials/textures and use the interpreter only for recursive materials/textures.
There is an important "wrong" factor in all the tests in this thread: compilation time is not part of the measured results. If you are doing a 5 mins long rendering, 2mins compilation time will kill the performances if accounted, if you are doing an 1 hour long rendering, it doesn't matter.
Short version: no solution is optimal in all cases, you may need both.
P.S. if it isn't clear, v2.2 removed conditional compilation, v2.3 removed dynamically generated code for materials/textures. They are 2 somewhat different topics even if the end result is to remove kernel re-compilation.