Page 3 of 8

Re: PATHGPU with OpenCL and CUDA support

Posted: Wed Apr 22, 2020 3:10 pm
by Dade
I have merged the cuda_rendering branch with the master: the CUDA support is officially on.

Re: PATHGPU with OpenCL and CUDA support

Posted: Wed Apr 22, 2020 3:16 pm
by Dade
Some interesting new CUDA profiling data. This is a rendering of a scene with ~14,000,000 triangles (an hair scene with high tessellation for the test):

Screenshot from 2020-04-22 16-42-53.png

87.8% of the time is spent running the ray/triangle intersection kernel: RTX can potentially destroy this time.

This is LuxCore2.1Benchmark scene (~1,400,000 triangles):

Screenshot from 2020-04-22 16-46-10.png

Only 17.5% of the time is spent running the ray/triangle intersection kernel: RTX can offer very little help here.

Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.

Re: PATHGPU with OpenCL and CUDA support

Posted: Wed Apr 22, 2020 3:39 pm
by Sharlybg
Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
Maybe the next GPU iterations will include specific hardware acceleration for shading complexity.
I have merged the cuda_rendering branch with the master: the CUDA support is officially on.
Anything missing or what we shouldn't expect from tis first CUDA support ? you are so fast it is barelly believeable.

Re: PATHGPU with OpenCL and CUDA support

Posted: Wed Apr 22, 2020 3:42 pm
by lacilaci
Dade wrote: Wed Apr 22, 2020 3:16 pm Some interesting new CUDA profiling data. This is a rendering of a scene with ~14,000,000 triangles (an hair scene with high tessellation for the test):


Screenshot from 2020-04-22 16-42-53.png


87.8% of the time is spent running the ray/triangle intersection kernel: RTX can potentially destroy this time.

This is LuxCore2.1Benchmark scene (~1,400,000 triangles):


Screenshot from 2020-04-22 16-46-10.png


Only 17.5% of the time is spent running the ray/triangle intersection kernel: RTX can offer very little help here.

Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
From personal experience on my work archviz/productviz cycles on rtx was always at least 2x faster than on cuda and from what i gathered about octane it should be 2-3x the performance of cuda as well..

Sure a scene with 500plygons wont benefit but we dont do such things in 2020 do we?

Re: PATHGPU with OpenCL and CUDA support

Posted: Wed Apr 22, 2020 3:53 pm
by Sharlybg
Sure RTX + Out of core on heavy project like with many subdivision + displacement + heavy vegetation + megascans will rock . ;)

Re: PATHGPU with OpenCL and CUDA support

Posted: Wed Apr 22, 2020 6:41 pm
by Dade
Sharlybg wrote: Wed Apr 22, 2020 3:39 pm Anything missing or what we shouldn't expect from tis first CUDA support ? you are so fast it is barelly believeable.
As far as I know, everything (tested) works (i.e. everything not yet tested could not work).

Re: PATHGPU with OpenCL and CUDA support

Posted: Thu Apr 23, 2020 3:09 pm
by Dade
I .. couldn't ... resist ...

Very first out of core rendering, 10+GB scene rendered on a 8GB card (8GB used by the OS, applications, etc. too, not a dedicated compute-only GPU):

outofcore.jpg

The scene has more than 8GB of artificially up scaled textured to use more ram:

Code: Select all

[LuxRays][26.806] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 784bytes (OUT OF CORE)
[LuxRays][26.809] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 4181238Kbytes (OUT OF CORE)
[LuxRays][36.726] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 3806677Kbytes (OUT OF CORE)
[LuxRays][46.090] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 551053Kbytes (OUT OF CORE)
Out of core rendering is going to be a delicate beast (notice how I'm using tile rendering to have good sample locality) but still looks like black magic.

Re: PATHGPU with OpenCL and CUDA support

Posted: Thu Apr 23, 2020 3:16 pm
by lacilaci
Dade wrote: Thu Apr 23, 2020 3:09 pm I .. couldn't ... resist ...

Very first out of core rendering, 10+GB scene rendered on a 8GB card (8GB used by the OS, applications, etc. too, not a dedicated compute-only GPU):


outofcore.jpg


The scene has more than 8GB of artificially up scaled textured to use more ram:

Code: Select all

[LuxRays][26.806] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 784bytes (OUT OF CORE)
[LuxRays][26.809] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 4181238Kbytes (OUT OF CORE)
[LuxRays][36.726] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 3806677Kbytes (OUT OF CORE)
[LuxRays][46.090] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 551053Kbytes (OUT OF CORE)
Out of core rendering is going to be a delicate beast (notice how I'm using tile rendering to have good sample locality) but still looks like black magic.
Pretty cool. What's the performance hit though?

Re: PATHGPU with OpenCL and CUDA support

Posted: Thu Apr 23, 2020 3:17 pm
by Sharlybg
I .. couldn't ... resist ...

Very first out of core rendering, 10+GB scene rendered on a 8GB card (*GB used by the OS applications, etc. too, not a dedicated compute-only GPU):
Euh what :shock:

You're doing out of core right now and it work ? Only for Nvidia i guess :mrgreen:

Re: PATHGPU with OpenCL and CUDA support

Posted: Thu Apr 23, 2020 3:44 pm
by Dade
lacilaci wrote: Thu Apr 23, 2020 3:16 pm Pretty cool. What's the performance hit though?
Quite big, a 50% slower in this first test but it is all about how much "locality" you have (i.e. GPU ram is used like a cache so it is all about the cache hit rate). I'm thinking to some special SOBOL option, dedicated to out of core rendering, to improve the samples "locality".

At the end of the day, as usual, there no free lunches, you have to give something (rendering time) to get something (bigger scene rendering).

P.S. Indeed, it is CUDA only stuff.