Page 1 of 1

Out of Core rendering for CUDA devices

Posted: Fri Apr 24, 2020 3:30 pm
by Dade
outofcore.jpg

Introduction

Out of core rendering is the capability to render (with GPUs) scenes requiring more memory than the one directly connected to the device. This feature requires hardware support so it is available only on some recent GPU (2-3 of the last NVIDIA GPU generations, AMD Vega GPUs, etc.).

AMD GPUs

This feature is already available trough a driver option (i.e. AMD HBCC) for AMD GPUs.

NVIDIA GPUs

This feature is now available for all NVIDIA GPUs with the required hardware and new LuxCoreRender support for CUDA and OoC rendering.

Other GPU vendors

The support for other GPU vendors can potentially be added by using OpenCL v2.x SVM.

GPU's RAM as a cache

The idea is to use the GPU ram as a cache of the CPU ram where the scene is effectively stored.
NOTE: this has the side effect of requiring more CPU ram than in a normal rendering because some scene data is not stored only in GPU ram.

You can enable OoC rendering by just flipping a flag:

Code: Select all

opencl.outofcore.enable = 1
Some buffer will be than allocate and marked as "OUT OF CORE":

Code: Select all

[LuxCore][4.235] Starting 1 OpenCL render threads
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 18225Kbytes (OUT OF CORE)
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] NOISE buffer size: 4556Kbytes (OUT OF CORE)
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] Denoiser samples count buffer size: 4556Kbytes (OUT OF CORE)
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] Denoiser squared weight buffer size: 4556Kbytes (OUT OF CORE)
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] Denoiser mean image buffer size: 13668Kbytes (OUT OF CORE)
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] Denoiser covariance buffer size: 27337Kbytes (OUT OF CORE)
[LuxRays][4.246] [Device GeForce RTX 2070 SUPER CUDAIntersect] Denoiser sample histogram buffer size: 273375Kbytes (OUT OF CORE)
[LuxRays][4.253] [Device GeForce RTX 2070 SUPER CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 18225Kbytes (OUT OF CORE)
[LuxRays][4.253] [Device GeForce RTX 2070 SUPER CUDAIntersect] Camera buffer size: 5468bytes
[LuxRays][4.254] [Device GeForce RTX 2070 SUPER CUDAIntersect] Normals buffer size: 40236Kbytes (OUT OF CORE)
[LuxRays][4.359] [Device GeForce RTX 2070 SUPER CUDAIntersect] UVs buffer size: 26410Kbytes (OUT OF CORE)
[LuxRays][4.425] [Device GeForce RTX 2070 SUPER CUDAIntersect] Alphas buffer size: 579Kbytes (OUT OF CORE)
[LuxRays][4.428] [Device GeForce RTX 2070 SUPER CUDAIntersect] Triangle normals buffer size: 64493Kbytes (OUT OF CORE)
[LuxRays][4.585] [Device GeForce RTX 2070 SUPER CUDAIntersect] Vertices buffer size: 40236Kbytes (OUT OF CORE)
[LuxRays][4.688] [Device GeForce RTX 2070 SUPER CUDAIntersect] Triangles buffer size: 64493Kbytes (OUT OF CORE)
[LuxRays][4.848] [Device GeForce RTX 2070 SUPER CUDAIntersect] Mesh description buffer size: 24Kbytes
[LuxRays][4.848] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMap descriptions buffer size: 1120bytes
[LuxRays][4.848] [Device GeForce RTX 2070 SUPER CUDAIntersect] ImageMaps buffer size: 339671Kbytes (OUT OF CORE)
[LuxRays][5.687] [Device GeForce RTX 2070 SUPER CUDAIntersect] Textures buffer size: 121Kbytes
[LuxRays][5.689] [Device GeForce RTX 2070 SUPER CUDAIntersect] Texture evaluation ops buffer size: 15Kbytes
[LuxRays][5.689] [Device GeForce RTX 2070 SUPER CUDAIntersect] Texture evaluation stacks buffer size: 131072Kbytes
[LuxRays][5.689] [Device GeForce RTX 2070 SUPER CUDAIntersect] Materials buffer size: 13Kbytes
[LuxRays][5.689] [Device GeForce RTX 2070 SUPER CUDAIntersect] Material evaluation ops buffer size: 6900bytes
[LuxRays][5.689] [Device GeForce RTX 2070 SUPER CUDAIntersect] Material evaluation stacks buffer size: 352256Kbytes
[LuxRays][5.690] [Device GeForce RTX 2070 SUPER CUDAIntersect] Scene objects buffer size: 1944bytes (OUT OF CORE)
[LuxRays][5.690] [Device GeForce RTX 2070 SUPER CUDAIntersect] Lights buffer size: 1328bytes
[LuxRays][5.690] [Device GeForce RTX 2070 SUPER CUDAIntersect] Light offsets (Part I) buffer size: 324bytes
[LuxRays][5.690] [Device GeForce RTX 2070 SUPER CUDAIntersect] Light offsets (Part II) buffer size: 16bytes
[LuxRays][5.690] [Device GeForce RTX 2070 SUPER CUDAIntersect] LightsDistribution buffer size: 40bytes
[LuxRays][5.690] [Device GeForce RTX 2070 SUPER CUDAIntersect] InfiniteLightSourcesDistribution buffer size: 40bytes
[LuxRays][5.691] [Device GeForce RTX 2070 SUPER CUDAIntersect] Ray buffer size: 49152Kbytes
[LuxRays][5.691] [Device GeForce RTX 2070 SUPER CUDAIntersect] RayHit buffer size: 20480Kbytes
[LuxRays][5.691] [Device GeForce RTX 2070 SUPER CUDAIntersect] GPUTaskConfiguration buffer size: 272bytes
[LuxRays][5.691] [Device GeForce RTX 2070 SUPER CUDAIntersect] GPUTask buffer size: 679936Kbytes
[LuxRays][5.691] [Device GeForce RTX 2070 SUPER CUDAIntersect] GPUTaskDirectLight buffer size: 61440Kbytes
[LuxRays][5.691] [Device GeForce RTX 2070 SUPER CUDAIntersect] GPUTaskState buffer size: 401408Kbytes
[LuxRays][5.692] [Device GeForce RTX 2070 SUPER CUDAIntersect] GPUTask Stats buffer size: 4096Kbytes
[LuxRays][5.692] [Device GeForce RTX 2070 SUPER CUDAIntersect] SamplerSharedData buffer size: 4570Kbytes
[LuxCore][5.693] [PathOCLBaseRenderThread::0] Size of a Sample: 40bytes
[LuxRays][5.693] [Device GeForce RTX 2070 SUPER CUDAIntersect] Sample buffer size: 40960Kbytes
[LuxCore][5.693] [PathOCLBaseRenderThread::0] Size of a SampleData: 8bytes
[LuxRays][5.693] [Device GeForce RTX 2070 SUPER CUDAIntersect] SampleData buffer size: 8192Kbytes
[LuxCore][5.693] [PathOCLBaseRenderThread::0] Size of a SampleResult: 304bytes
[LuxRays][5.693] [Device GeForce RTX 2070 SUPER CUDAIntersect] Sample buffer size: 311296Kbytes
[LuxRays][5.693] [Device GeForce RTX 2070 SUPER CUDAIntersect] PathInfo buffer size: 110592Kbytes
[LuxRays][5.694] [Device GeForce RTX 2070 SUPER CUDAIntersect] DirectLightVolumeInfo buffer size: 45056Kbytes
[LuxRays][5.694] [Device GeForce RTX 2070 SUPER CUDAIntersect] Pixel Filter Distribution buffer size: 33Kbytes
Locality

As any cache, it works only if most samples will reuse the same data (see the new Random/Sobol sampler options below). Otherwise the rendering can become very very slow with an endless transfer of data between CPUs and GPUs through the (slow) PCIe bus.

New Random/Sobol sampler options for "Locality"

TILEPATH is a obvious option for improving render "Locality". TODO: I'm working on some new Random/Sobol sampler option for improving "Locality".

List of OoC buffers

This is the complete list of buffers that are currently allocated as Out of Core:

- image map pixels;
- vertex position;
- vertex normals;
- vertex UVs;
- vertex colors;
- vertex alphas;
- vertex AOVs;
- triangle normals;
- triangle vertex indices;
- scene object descriptions;
- (M)BVH data;
- PhotonGI indirect cache;
- PhotonGI caustic cache;
- Film frame buffers and AOVs.

The list can be extended but the idea is to keep small and often used data always in GPU ram and use what is left as a cache.

Re: Out of Core rendering for CUDA devices

Posted: Mon May 18, 2020 6:04 pm
by B.Y.O.B.
This is now supported in the Blender addon, the option can be found in the new "sampling" panel in the render properties.