Re: PATHGPU with OpenCL and CUDA support
Posted: Wed Apr 22, 2020 3:10 pm
I have merged the cuda_rendering branch with the master: the CUDA support is officially on.
Show your work, get help, participate in development
https://forums.luxcorerender.org/
Maybe the next GPU iterations will include specific hardware acceleration for shading complexity.Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
Anything missing or what we shouldn't expect from tis first CUDA support ? you are so fast it is barelly believeable.I have merged the cuda_rendering branch with the master: the CUDA support is officially on.
From personal experience on my work archviz/productviz cycles on rtx was always at least 2x faster than on cuda and from what i gathered about octane it should be 2-3x the performance of cuda as well..Dade wrote: Wed Apr 22, 2020 3:16 pm Some interesting new CUDA profiling data. This is a rendering of a scene with ~14,000,000 triangles (an hair scene with high tessellation for the test):
Screenshot from 2020-04-22 16-42-53.png
87.8% of the time is spent running the ray/triangle intersection kernel: RTX can potentially destroy this time.
This is LuxCore2.1Benchmark scene (~1,400,000 triangles):
Screenshot from 2020-04-22 16-46-10.png
Only 17.5% of the time is spent running the ray/triangle intersection kernel: RTX can offer very little help here.
Short version: the RTX importance will scale up with the scene geometry complexity and down with the shading complexity. Most modern scenes are usually in a 40%/60% or 60%/40% ratio range.
As far as I know, everything (tested) works (i.e. everything not yet tested could not work).Sharlybg wrote: Wed Apr 22, 2020 3:39 pm Anything missing or what we shouldn't expect from tis first CUDA support ? you are so fast it is barelly believeable.
Code: Select all
[LuxRays][26.806] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 784bytes (OUT OF CORE)
[LuxRays][26.809] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 4181238Kbytes (OUT OF CORE)
[LuxRays][36.726] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 3806677Kbytes (OUT OF CORE)
[LuxRays][46.090] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 551053Kbytes (OUT OF CORE)
Pretty cool. What's the performance hit though?Dade wrote: Thu Apr 23, 2020 3:09 pm I .. couldn't ... resist ...
Very first out of core rendering, 10+GB scene rendered on a 8GB card (8GB used by the OS, applications, etc. too, not a dedicated compute-only GPU):
outofcore.jpg
The scene has more than 8GB of artificially up scaled textured to use more ram:
Out of core rendering is going to be a delicate beast (notice how I'm using tile rendering to have good sample locality) but still looks like black magic.Code: Select all
[LuxRays][26.806] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 784bytes (OUT OF CORE) [LuxRays][26.809] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 4181238Kbytes (OUT OF CORE) [LuxRays][36.726] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 3806677Kbytes (OUT OF CORE) [LuxRays][46.090] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 551053Kbytes (OUT OF CORE)
Euh whatI .. couldn't ... resist ...
Very first out of core rendering, 10+GB scene rendered on a 8GB card (*GB used by the OS applications, etc. too, not a dedicated compute-only GPU):
Quite big, a 50% slower in this first test but it is all about how much "locality" you have (i.e. GPU ram is used like a cache so it is all about the cache hit rate). I'm thinking to some special SOBOL option, dedicated to out of core rendering, to improve the samples "locality".