PATHGPU with OpenCL and CUDA support
Re: PATHGPU with OpenCL and CUDA support
Some interesting new CUDA profiling data. This is a rendering of a scene with ~14,000,000 triangles (an hair scene with high tessellation for the test):
87.8% of the time is spent running the ray/triangle intersection kernel: RTX can potentially destroy this time.
This is LuxCore2.1Benchmark scene (~1,400,000 triangles):
Only 17.5% of the time is spent running the ray/triangle intersection kernel: RTX can offer very little help here.
Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
87.8% of the time is spent running the ray/triangle intersection kernel: RTX can potentially destroy this time.
This is LuxCore2.1Benchmark scene (~1,400,000 triangles):
Only 17.5% of the time is spent running the ray/triangle intersection kernel: RTX can offer very little help here.
Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
Re: PATHGPU with OpenCL and CUDA support
Maybe the next GPU iterations will include specific hardware acceleration for shading complexity.Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
Anything missing or what we shouldn't expect from tis first CUDA support ? you are so fast it is barelly believeable.I have merged the cuda_rendering branch with the master: the CUDA support is officially on.
Re: PATHGPU with OpenCL and CUDA support
From personal experience on my work archviz/productviz cycles on rtx was always at least 2x faster than on cuda and from what i gathered about octane it should be 2-3x the performance of cuda as well..Dade wrote: ↑Wed Apr 22, 2020 3:16 pm Some interesting new CUDA profiling data. This is a rendering of a scene with ~14,000,000 triangles (an hair scene with high tessellation for the test):
Screenshot from 2020-04-22 16-42-53.png
87.8% of the time is spent running the ray/triangle intersection kernel: RTX can potentially destroy this time.
This is LuxCore2.1Benchmark scene (~1,400,000 triangles):
Screenshot from 2020-04-22 16-46-10.png
Only 17.5% of the time is spent running the ray/triangle intersection kernel: RTX can offer very little help here.
Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
Sure a scene with 500plygons wont benefit but we dont do such things in 2020 do we?
Re: PATHGPU with OpenCL and CUDA support
Sure RTX + Out of core on heavy project like with many subdivision + displacement + heavy vegetation + megascans will rock .
Re: PATHGPU with OpenCL and CUDA support
As far as I know, everything (tested) works (i.e. everything not yet tested could not work).
Re: PATHGPU with OpenCL and CUDA support
I .. couldn't ... resist ...
Very first out of core rendering, 10+GB scene rendered on a 8GB card (8GB used by the OS, applications, etc. too, not a dedicated compute-only GPU):
The scene has more than 8GB of artificially up scaled textured to use more ram:
Out of core rendering is going to be a delicate beast (notice how I'm using tile rendering to have good sample locality) but still looks like black magic.
Very first out of core rendering, 10+GB scene rendered on a 8GB card (8GB used by the OS, applications, etc. too, not a dedicated compute-only GPU):
The scene has more than 8GB of artificially up scaled textured to use more ram:
Code: Select all
[LuxRays][26.806] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 784bytes (OUT OF CORE)
[LuxRays][26.809] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 4181238Kbytes (OUT OF CORE)
[LuxRays][36.726] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 3806677Kbytes (OUT OF CORE)
[LuxRays][46.090] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 551053Kbytes (OUT OF CORE)
Re: PATHGPU with OpenCL and CUDA support
Pretty cool. What's the performance hit though?Dade wrote: ↑Thu Apr 23, 2020 3:09 pm I .. couldn't ... resist ...
Very first out of core rendering, 10+GB scene rendered on a 8GB card (8GB used by the OS, applications, etc. too, not a dedicated compute-only GPU):
outofcore.jpg
The scene has more than 8GB of artificially up scaled textured to use more ram:
Out of core rendering is going to be a delicate beast (notice how I'm using tile rendering to have good sample locality) but still looks like black magic.Code: Select all
[LuxRays][26.806] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 784bytes (OUT OF CORE) [LuxRays][26.809] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 4181238Kbytes (OUT OF CORE) [LuxRays][36.726] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 3806677Kbytes (OUT OF CORE) [LuxRays][46.090] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 551053Kbytes (OUT OF CORE)
Re: PATHGPU with OpenCL and CUDA support
Euh whatI .. couldn't ... resist ...
Very first out of core rendering, 10+GB scene rendered on a 8GB card (*GB used by the OS applications, etc. too, not a dedicated compute-only GPU):
You're doing out of core right now and it work ? Only for Nvidia i guess
Re: PATHGPU with OpenCL and CUDA support
Quite big, a 50% slower in this first test but it is all about how much "locality" you have (i.e. GPU ram is used like a cache so it is all about the cache hit rate). I'm thinking to some special SOBOL option, dedicated to out of core rendering, to improve the samples "locality".
At the end of the day, as usual, there no free lunches, you have to give something (rendering time) to get something (bigger scene rendering).
P.S. Indeed, it is CUDA only stuff.