Loading kernels tooo slow

Use this forum for general user support and related questions.
Forum rules
Please upload a testscene that allows developers to reproduce the problem, and attach some images.
User avatar
Dade
Developer
Posts: 5423
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: Loading kernels tooo slow

Post by Dade » Thu Jul 22, 2021 9:58 am

B.Y.O.B. wrote:
Thu Jul 22, 2021 9:42 am
It might be the default epsilon settings that are different between luxcoreui and BlendLuxCore.
luxcoreui uses the default LuxCore values, does BlendLuxCore too ?
Support LuxCoreRender project with salts and bounties

User avatar
B.Y.O.B.
Developer
Posts: 4114
Joined: Mon Dec 04, 2017 10:08 pm
Location: Germany
Contact:

Re: Loading kernels tooo slow

Post by B.Y.O.B. » Thu Jul 22, 2021 3:40 pm

I think so, but there might be subtle differences with the floats being stored in Blender properties and passed through Python, etc.

User avatar
TAO
Developer
Posts: 735
Joined: Sun Mar 24, 2019 4:49 pm
Location: France
Contact:

Re: Loading kernels tooo slow

Post by TAO » Thu Jul 22, 2021 4:59 pm

B.Y.O.B. wrote:
Thu Jul 22, 2021 3:40 pm
I think so, but there might be subtle differences with the floats being stored in Blender properties and passed through Python, etc.
I can say the same thing for 3dsmax too.
Omid Ghotbi (TAO)
Latest build Download link👇👇
https://github.com/LuxCoreRender/MaxToLux/releases
Last update information
https://forums.luxcorerender.org/viewto ... 590#p30154

kintuX
Posts: 684
Joined: Wed Jan 10, 2018 2:37 am

Re: Loading kernels tooo slow

Post by kintuX » Fri Jul 23, 2021 10:01 pm

Dade wrote:
Thu Jul 22, 2021 9:02 am
kintuX wrote:
Sun Jul 18, 2021 12:28 pm
Ran it & does Path OCL, so when I start with Blender it does it all over again for CUDA :?
Uh ? No, it uses the default devices: if you have CUDA, they are all CUDA devices (otherwise all OpenCL devices) :?:
It runs PathOCL anyways. And then w/ Blender does it again for CUDA.
So... :|

Here's what luxcoreui console says:

Code: Select all

>luxcoreui.exe
LuxCoreUI v2.6alpha0 (LuxCore demo: http://www.luxcorerender.org)
====================================================================
KernelCache FillProgressHandler Step: 0/3
Step: 0/3
Creating kernel cache entry with configuration properties:
renderengine.type = "PATHOCL"
sampler.type = "SOBOL"
scene.epsilon.min = 1e-05
scene.epsilon.max = 0.1

And scene properties:

[SDL][13.906] Define ImageMap: NamedObject
[SDL][13.906] Define ImageMap: image.png
[SDL][13.906] Camera type: perspective
[SDL][13.906] Camera position: Point[1, 6, 3]
[SDL][13.906] Camera target: Point[0, 0, 0.5]
[SDL][13.906] Camera clipping plane disabled
[SDL][13.906] Light definition: infinite_light
[SDL][13.922] Material definition: matte_mat
[SDL][13.922] Scene objects count: 1
[SDL][13.922] Texture definition: constfloat3_tex
[SDL][13.922] Material definition: constfloat3_tmat
[SDL][13.922] Scene objects count: 1
[LuxCore][13.922] Configuration:
[LuxCore][13.922]   renderengine.type = "PATHOCL"
[LuxCore][13.922]   sampler.type = "SOBOL"
[LuxCore][13.922]   scene.epsilon.min = 1e-05
[LuxCore][13.922]   scene.epsilon.max = 0.1
[LuxCore][13.922]   film.outputs.1.type = "RGB_IMAGEPIPELINE"
[LuxCore][13.922]   film.outputs.1.filename = "image.png"
[LuxCore][13.937] File Name Resolver Configuration:
[LuxCore][13.937] Film resolution: 640x640
[SDL][13.937] Film output definition: RGB_IMAGEPIPELINE [image.png]
[SDL][13.937] Film output definition: RGB_IMAGEPIPELINE [image.png]
[LuxRays][13.937] OpenCL support: enabled
[LuxRays][14.031] OpenCL Platform 0: NVIDIA CUDA
[LuxRays][14.031] CUDA support: enabled
[LuxRays][14.031] CUDA support: available
[LuxRays][14.031] CUDA driver version: 11.20
[LuxRays][14.031] CUDA device count: 2
[LuxRays][14.031] Optix support: available
[LuxRays][14.031] Device 0 name: Native
[LuxRays][14.031] Device 0 type: NATIVE_THREAD
[LuxRays][14.031] Device 0 compute units: 1
[LuxRays][14.031] Device 0 preferred float vector width: 4
[LuxRays][14.031] Device 0 max allocable memory: 17592186044415MBytes
[LuxRays][14.031] Device 0 max allocable memory block size: 17592186044415MBytes
[LuxRays][14.031] Device 0 has out of core memory support: 0
[LuxRays][14.031] Device 1 name: GeForce GTX 1070
[LuxRays][14.031] Device 1 type: OPENCL_GPU
[LuxRays][14.031] Device 1 compute units: 15
[LuxRays][14.031] Device 1 preferred float vector width: 1
[LuxRays][14.031] Device 1 max allocable memory: 8192MBytes
[LuxRays][14.031] Device 1 max allocable memory block size: 2048MBytes
[LuxRays][14.031] Device 1 has out of core memory support: 0
[LuxRays][14.031] Device 2 name: Quadro M5000
[LuxRays][14.031] Device 2 type: OPENCL_GPU
[LuxRays][14.031] Device 2 compute units: 16
[LuxRays][14.031] Device 2 preferred float vector width: 1
[LuxRays][14.031] Device 2 max allocable memory: 8192MBytes
[LuxRays][14.047] Device 2 max allocable memory block size: 2048MBytes
[LuxRays][14.047] Device 2 has out of core memory support: 0
[LuxRays][14.047] Device 3 name: GeForce GTX 1070
[LuxRays][14.047] Device 3 type: CUDA_GPU
[LuxRays][14.047] Device 3 compute units: 128
[LuxRays][14.047] Device 3 preferred float vector width: 1
[LuxRays][14.047] Device 3 max allocable memory: 8192MBytes
[LuxRays][14.047] Device 3 max allocable memory block size: 17592186044415MBytes
[LuxRays][14.047] Device 3 has out of core memory support: 1
[LuxRays][14.047] Device 3 CUDA compute capability: 6.1
[LuxRays][14.047] Device 4 name: Quadro M5000
[LuxRays][14.047] Device 4 type: CUDA_GPU
[LuxRays][14.047] Device 4 compute units: 128
[LuxRays][14.047] Device 4 preferred float vector width: 1
[LuxRays][14.047] Device 4 max allocable memory: 8192MBytes
[LuxRays][14.047] Device 4 max allocable memory block size: 17592186044415MBytes
[LuxRays][14.047] Device 4 has out of core memory support: 1
[LuxRays][14.047] Device 4 CUDA compute capability: 5.2
[LuxRays][14.047] Creating 34 intersection device(s)
[LuxRays][14.047] Allocating intersection device 0: GeForce GTX 1070 (Type = CUDA_GPU)
[LuxRays][14.219] Allocating intersection device 1: Quadro M5000 (Type = CUDA_GPU)
[LuxRays][14.328] Allocating intersection device 2: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 3: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 4: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 5: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 6: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 7: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 8: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 9: Native (Type = NATIVE_THREAD)
[LuxRays][14.328] Allocating intersection device 10: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 11: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 12: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 13: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 14: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 15: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 16: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 17: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 18: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 19: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 20: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 21: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 22: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 23: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 24: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 25: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 26: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 27: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 28: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 29: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 30: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 31: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 32: Native (Type = NATIVE_THREAD)
[LuxRays][14.344] Allocating intersection device 33: Native (Type = NATIVE_THREAD)
[LuxCore][14.344] CUDA devices used:
[LuxCore][14.344] [GeForce GTX 1070 CUDAIntersect (Optix enabled: 0)]
[LuxCore][14.344] [Quadro M5000 CUDAIntersect (Optix enabled: 0)]
[LuxCore][14.359] OpenCL devices used:
[LuxCore][14.359] Native devices used: 32
[LuxCore][14.359] Configuring 2 OpenCL render threads
[LuxCore][14.359] Configuring 32 native render threads
[LuxRays][14.359] Preprocessing DataSet
[LuxRays][14.359] Total vertex count: 48
[LuxRays][14.359] Total triangle count: 24
[LuxRays][14.359] Preprocessing DataSet done
[LuxRays][14.359] Adding DataSet accelerator: BVH
[LuxRays][14.359] Total vertex count: 48
[LuxRays][14.359] Total triangle count: 24
[LuxRays][14.359] BVH Dataset preprocessing time: 0ms
[LuxRays][14.359] BVH builder: EMBREE_BINNED_SAH
[LuxRays][14.359] BVH build hierarchy time: 0ms
[LuxRays][14.359] BVH total build time: 0ms
[LuxRays][14.359] Total BVH memory usage: 1Kbytes
[LuxRays][14.359] Adding DataSet accelerator: EMBREE
[LuxRays][14.359] Total vertex count: 48
[LuxRays][14.359] Total triangle count: 24
[LuxRays][14.375] EmbreeAccel build time: 0ms
[LuxRays][14.375] [Device GeForce GTX 1070 CUDAIntersect] BVH mesh vertices buffer size: 576bytes
[LuxRays][14.375] [Device GeForce GTX 1070 CUDAIntersect] BVH nodes buffer size: 1248bytes
[LuxRays][14.375] [BVHKernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][14.375] [BVHKernel] Compiling kernels
[LuxRays][14.672] [BVHKernel] CUDA program compilation warnings:
bvh.cl(144): warning #191-D: type qualifier is meaningless on cast type


[LuxRays][14.672] [BVHKernel] Program not cached
[LuxRays][14.672] [Device Quadro M5000 CUDAIntersect] BVH mesh vertices buffer size: 576bytes
[LuxRays][14.672] [Device Quadro M5000 CUDAIntersect] BVH nodes buffer size: 1248bytes
[LuxRays][14.672] [BVHKernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][14.672] [BVHKernel] Compiling kernels
[LuxRays][14.844] [BVHKernel] CUDA program compilation warnings:
bvh.cl(144): warning #191-D: type qualifier is meaningless on cast type


[LuxRays][14.844] [BVHKernel] Program not cached
[LuxCore][14.859] [PathOCLRenderEngine] OpenCL task count: 262144
[LuxCore][14.875] [PathOCLBaseRenderEngine] OpenCL max. page memory size: 18014398509481983Kbytes
[LuxCore][14.875] Compile Geometry
[LuxCore][14.875] Scene geometry compilation time: 0ms
[LuxCore][14.875] Compile 2 Textures
[LuxCore][14.875] Texture evaluation ops count: 6
[LuxCore][14.875] Texture evaluation max. stack size: 3
[LuxCore][14.875] Textures compilation time: 0ms
[LuxCore][14.875] Compile 2 Materials
[LuxCore][14.875] Material evaluation ops count: 14
[LuxCore][14.875] Material evaluation max. stack size: 8
[LuxCore][14.875] Material compilation time: 0ms
[LuxCore][14.875] Compile Lights
[LuxCore][14.875] Lights compilation time: 0ms
[LuxCore][14.875] Compile ImageMaps
[LuxCore][14.875] Image maps page(s) count: 1
[LuxCore][14.875]  RGB channel page 0 size: 3264Kbytes
[LuxCore][14.875] Image maps compilation time: 0ms
[LuxCore][14.875] Always enabled OpenCL code:
[LuxCore][14.875] Compile Geometry
[LuxCore][14.891] Scene geometry compilation time: 0ms
[LuxCore][14.891] Compile 2 Textures
[LuxCore][14.891] Texture evaluation ops count: 6
[LuxCore][14.891] Texture evaluation max. stack size: 3
[LuxCore][14.891] Textures compilation time: 0ms
[LuxCore][14.891] Compile 2 Materials
[LuxCore][14.891] Material evaluation ops count: 14
[LuxCore][14.891] Material evaluation max. stack size: 8
[LuxCore][14.891] Material compilation time: 0ms
[LuxCore][14.891] Compile Lights
[LuxCore][14.891] Lights compilation time: 0ms
[LuxCore][14.891] Compile ImageMaps
[LuxCore][14.891] Image maps page(s) count: 1
[LuxCore][14.891]  RGB channel page 0 size: 3264Kbytes
[LuxCore][14.891] Image maps compilation time: 0ms
[LuxCore][14.891] Starting 2 OpenCL render threads
[LuxRays][14.906] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 6400Kbytes
[LuxRays][14.906] [Device GeForce GTX 1070 CUDAIntersect] NOISE buffer size: 1600Kbytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 6400Kbytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Camera buffer size: 5492bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Triangle normals buffer size: 288bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Vertices buffer size: 576bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Triangles buffer size: 288bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Mesh description buffer size: 624bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] ImageMap descriptions buffer size: 80bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] ImageMaps buffer size: 3264Kbytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Textures buffer size: 656bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Texture evaluation ops buffer size: 48bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Texture evaluation stacks buffer size: 3072Kbytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Materials buffer size: 456bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Material evaluation ops buffer size: 168bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Material evaluation stacks buffer size: 8192Kbytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Scene objects buffer size: 48bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Lights buffer size: 344bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Env. light indices buffer size: 4bytes
[LuxRays][14.922] [Device GeForce GTX 1070 CUDAIntersect] Light offsets (Part I) buffer size: 8bytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] Env. light distributions buffer size: 516Kbytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] LightsDistribution buffer size: 16bytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] InfiniteLightSourcesDistribution buffer size: 16bytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] Ray buffer size: 12288Kbytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] RayHit buffer size: 5120Kbytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] GPUTaskConfiguration buffer size: 336bytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] GPUTask buffer size: 169984Kbytes
[LuxRays][14.937] [Device GeForce GTX 1070 CUDAIntersect] GPUTaskDirectLight buffer size: 15360Kbytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] GPUTaskState buffer size: 100352Kbytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] GPUTask Stats buffer size: 1024Kbytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] SamplerSharedData buffer size: 1608Kbytes
[LuxCore][14.953] [PathOCLBaseRenderThread::0] Size of a Sample: 40bytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] Sample buffer size: 10240Kbytes
[LuxCore][14.953] [PathOCLBaseRenderThread::0] Size of a SampleData: 8bytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] SampleData buffer size: 2048Kbytes
[LuxCore][14.953] [PathOCLBaseRenderThread::0] Size of a SampleResult: 428bytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] SampleResult buffer size: 109568Kbytes
[LuxRays][14.953] [Device GeForce GTX 1070 CUDAIntersect] PathInfo buffer size: 27648Kbytes
[LuxRays][14.969] [Device GeForce GTX 1070 CUDAIntersect] DirectLightVolumeInfo buffer size: 11264Kbytes
[LuxRays][14.969] [Device GeForce GTX 1070 CUDAIntersect] Pixel Filter Distribution buffer size: 33Kbytes
[LuxCore][14.969] [PathOCLBaseRenderThread::0] Compiling kernels
[LuxRays][14.969] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][14.969] [PathOCL kernel] Compiling kernels
[LuxRays][195.500] [PathOCL kernel] Program not cached
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling Film_Clear Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling InitSeed Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling Init Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_RT_NEXT_VERTEX Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_HIT_NOTHING Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_HIT_OBJECT Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_RT_DL Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_DL_ILLUMINATE Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_DL_SAMPLE_BSDF Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_GENERATE_NEXT_VERTEX_RAY Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_SPLAT_SAMPLE Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_NEXT_SAMPLE Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_GENERATE_CAMERA_RAY Kernel
[LuxCore][195.516] [PathOCLBaseRenderThread::0] AdvancePaths_MK_* workgroup size: 32
[LuxCore][195.516] [PathOCLBaseRenderThread::0] Kernels compilation time: 180547ms
[LuxRays][195.531] [Device Quadro M5000 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 6400Kbytes
[LuxRays][195.531] [Device Quadro M5000 CUDAIntersect] NOISE buffer size: 1600Kbytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 6400Kbytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] Camera buffer size: 5492bytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] Triangle normals buffer size: 288bytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] Vertices buffer size: 576bytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] Triangles buffer size: 288bytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] Mesh description buffer size: 624bytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] ImageMap descriptions buffer size: 80bytes
[LuxRays][195.547] [Device Quadro M5000 CUDAIntersect] ImageMaps buffer size: 3264Kbytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Textures buffer size: 656bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Texture evaluation ops buffer size: 48bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Texture evaluation stacks buffer size: 3072Kbytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Materials buffer size: 456bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Material evaluation ops buffer size: 168bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Material evaluation stacks buffer size: 8192Kbytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Scene objects buffer size: 48bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Lights buffer size: 344bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Env. light indices buffer size: 4bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Light offsets (Part I) buffer size: 8bytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] Env. light distributions buffer size: 516Kbytes
[LuxRays][195.562] [Device Quadro M5000 CUDAIntersect] LightsDistribution buffer size: 16bytes
[LuxRays][195.578] [Device Quadro M5000 CUDAIntersect] InfiniteLightSourcesDistribution buffer size: 16bytes
[LuxRays][195.578] [Device Quadro M5000 CUDAIntersect] Ray buffer size: 12288Kbytes
[LuxRays][195.578] [Device Quadro M5000 CUDAIntersect] RayHit buffer size: 5120Kbytes
[LuxRays][195.578] [Device Quadro M5000 CUDAIntersect] GPUTaskConfiguration buffer size: 336bytes
[LuxRays][195.578] [Device Quadro M5000 CUDAIntersect] GPUTask buffer size: 169984Kbytes
[LuxRays][195.594] [Device Quadro M5000 CUDAIntersect] GPUTaskDirectLight buffer size: 15360Kbytes
[LuxRays][195.594] [Device Quadro M5000 CUDAIntersect] GPUTaskState buffer size: 100352Kbytes
[LuxRays][195.609] [Device Quadro M5000 CUDAIntersect] GPUTask Stats buffer size: 1024Kbytes
[LuxRays][195.609] [Device Quadro M5000 CUDAIntersect] SamplerSharedData buffer size: 1608Kbytes
[LuxCore][195.609] [PathOCLBaseRenderThread::1] Size of a Sample: 40bytes
[LuxRays][195.609] [Device Quadro M5000 CUDAIntersect] Sample buffer size: 10240Kbytes
[LuxCore][195.609] [PathOCLBaseRenderThread::1] Size of a SampleData: 8bytes
[LuxRays][195.609] [Device Quadro M5000 CUDAIntersect] SampleData buffer size: 2048Kbytes
[LuxCore][195.609] [PathOCLBaseRenderThread::1] Size of a SampleResult: 428bytes
[LuxRays][195.609] [Device Quadro M5000 CUDAIntersect] SampleResult buffer size: 109568Kbytes
[LuxRays][195.625] [Device Quadro M5000 CUDAIntersect] PathInfo buffer size: 27648Kbytes
[LuxRays][195.625] [Device Quadro M5000 CUDAIntersect] DirectLightVolumeInfo buffer size: 11264Kbytes
[LuxRays][195.625] [Device Quadro M5000 CUDAIntersect] Pixel Filter Distribution buffer size: 33Kbytes
[LuxCore][195.641] [PathOCLBaseRenderThread::1] Compiling kernels
[LuxRays][195.641] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_PATHOCL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][195.641] [PathOCL kernel] Compiling kernels
[LuxRays][382.328] [PathOCL kernel] Program not cached
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling Film_Clear Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling InitSeed Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling Init Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_RT_NEXT_VERTEX Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_HIT_NOTHING Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_HIT_OBJECT Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_RT_DL Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_DL_ILLUMINATE Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_DL_SAMPLE_BSDF Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_GENERATE_NEXT_VERTEX_RAY Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_SPLAT_SAMPLE Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_NEXT_SAMPLE Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Compiling AdvancePaths_MK_GENERATE_CAMERA_RAY Kernel
[LuxCore][382.328] [PathOCLBaseRenderThread::1] AdvancePaths_MK_* workgroup size: 32
[LuxCore][382.328] [PathOCLBaseRenderThread::1] Kernels compilation time: 186702ms
[LuxCore][382.328] Starting 32 native render threads
[LuxCore][382.359] Film hardware image pipeline
[LuxCore][383.047] Film hardware device used: GeForce GTX 1070 CUDAIntersect (Type: CUDA_GPU)
[LuxRays][383.047] [Device GeForce GTX 1070 CUDAIntersect] IMAGEPIPELINE buffer size: 4800Kbytes
[LuxRays][383.062] [Device GeForce GTX 1070 CUDAIntersect] Merge buffer size: 6400Kbytes
[LuxRays][383.062] [MergeSampleBuffersOCL] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][383.062] [MergeSampleBuffersOCL] Compiling kernels
[LuxRays][383.328] [MergeSampleBuffersOCL] Program not cached
[LuxCore][383.328] [MergeSampleBuffersOCL] Compiling Film_MergeBufferInitialize Kernel
[LuxCore][383.328] [MergeSampleBuffersOCL] Compiling Film_MergeRADIANCE_PER_PIXEL_NORMALIZED Kernel
[LuxCore][383.328] [MergeSampleBuffersOCL] Compiling Film_MergeRADIANCE_PER_SCREEN_NORMALIZED Kernel
[LuxCore][383.328] [MergeSampleBuffersOCL] Compiling Film_MergeBufferFinalize Kernel
[LuxCore][383.328] [MergeSampleBuffersOCL] Kernels compilation time: 265ms
[LuxRays][383.437] [Device GeForce GTX 1070 CUDAIntersect] Accumulation buffer size: 37Kbytes
[LuxRays][383.437] [AutoLinearToneMap] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][383.437] [AutoLinearToneMap] Compiling kernels
[LuxRays][383.719] [AutoLinearToneMap] Program not cached
[LuxCore][383.719] [AutoLinearToneMap] Compiling OpRGBValuesReduce Kernel
[LuxCore][383.719] [AutoLinearToneMap] Compiling OpRGBValueAccumulate Kernel
[LuxCore][383.719] [AutoLinearToneMap] Compiling AutoLinearToneMap_Apply Kernel
[LuxCore][383.719] [AutoLinearToneMap] Kernels compilation time: 281ms
[LuxRays][383.719] [Device GeForce GTX 1070 CUDAIntersect] Gamma table buffer size: 64Kbytes
[LuxRays][383.719] [GammaCorrectionPlugin] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][383.719] [GammaCorrectionPlugin] Compiling kernels
[LuxRays][383.969] [GammaCorrectionPlugin] Program not cached
[LuxCore][383.969] [GammaCorrectionPlugin] Compiling GammaCorrectionPlugin_Apply Kernel
[LuxCore][383.969] [GammaCorrectionPlugin] Kernels compilation time: 250ms
[LuxCore][384.141] Noise estimation: first pass
[LuxCore][384.469] Noise estimation: Error mean = 2.39255e-06
[LuxCore][384.687] [GeForce GTX 1070 CUDAIntersect] Memory used for hardware image pipeline: 11200Kbytes
Done.
====================================================================
KernelCache FillProgressHandler Step: 1/3
Step: 1/3
Creating kernel cache entry with configuration properties:
renderengine.type = "TILEPATHOCL"
sampler.type = "TILEPATHSAMPLER"
scene.epsilon.min = 1e-05
scene.epsilon.max = 0.1

And scene properties:

[SDL][384.781] Define ImageMap: NamedObject
[SDL][384.781] Define ImageMap: image.png
[SDL][384.781] Camera type: perspective
[SDL][384.781] Camera position: Point[1, 6, 3]
[SDL][384.781] Camera target: Point[0, 0, 0.5]
[SDL][384.781] Camera clipping plane disabled
[SDL][384.781] Light definition: infinite_light
[SDL][384.781] Material definition: matte_mat
[SDL][384.781] Scene objects count: 1
[SDL][384.797] Texture definition: constfloat3_tex
[SDL][384.797] Material definition: constfloat3_tmat
[SDL][384.797] Scene objects count: 1
[LuxCore][384.797] Configuration:
[LuxCore][384.797]   renderengine.type = "TILEPATHOCL"
[LuxCore][384.797]   sampler.type = "TILEPATHSAMPLER"
[LuxCore][384.797]   scene.epsilon.min = 1e-05
[LuxCore][384.797]   scene.epsilon.max = 0.1
[LuxCore][384.797]   film.outputs.1.type = "RGB_IMAGEPIPELINE"
[LuxCore][384.797]   film.outputs.1.filename = "image.png"
[LuxCore][384.797] File Name Resolver Configuration:
[LuxCore][384.797] Film resolution: 640x640
[SDL][384.797] Film output definition: RGB_IMAGEPIPELINE [image.png]
[SDL][384.797] Film output definition: RGB_IMAGEPIPELINE [image.png]
[LuxRays][384.797] OpenCL support: enabled
[LuxRays][384.797] OpenCL Platform 0: NVIDIA CUDA
[LuxRays][384.797] CUDA support: enabled
[LuxRays][384.797] CUDA support: available
[LuxRays][384.797] CUDA driver version: 11.20
[LuxRays][384.812] CUDA device count: 2
[LuxRays][384.812] Optix support: available
[LuxRays][384.812] Device 0 name: Native
[LuxRays][384.812] Device 0 type: NATIVE_THREAD
[LuxRays][384.812] Device 0 compute units: 1
[LuxRays][384.812] Device 0 preferred float vector width: 4
[LuxRays][384.812] Device 0 max allocable memory: 17592186044415MBytes
[LuxRays][384.812] Device 0 max allocable memory block size: 17592186044415MBytes
[LuxRays][384.812] Device 0 has out of core memory support: 0
[LuxRays][384.812] Device 1 name: GeForce GTX 1070
[LuxRays][384.812] Device 1 type: OPENCL_GPU
[LuxRays][384.812] Device 1 compute units: 15
[LuxRays][384.812] Device 1 preferred float vector width: 1
[LuxRays][384.812] Device 1 max allocable memory: 8192MBytes
[LuxRays][384.812] Device 1 max allocable memory block size: 2048MBytes
[LuxRays][384.812] Device 1 has out of core memory support: 0
[LuxRays][384.812] Device 2 name: Quadro M5000
[LuxRays][384.812] Device 2 type: OPENCL_GPU
[LuxRays][384.812] Device 2 compute units: 16
[LuxRays][384.812] Device 2 preferred float vector width: 1
[LuxRays][384.812] Device 2 max allocable memory: 8192MBytes
[LuxRays][384.812] Device 2 max allocable memory block size: 2048MBytes
[LuxRays][384.812] Device 2 has out of core memory support: 0
[LuxRays][384.812] Device 3 name: GeForce GTX 1070
[LuxRays][384.812] Device 3 type: CUDA_GPU
[LuxRays][384.812] Device 3 compute units: 128
[LuxRays][384.812] Device 3 preferred float vector width: 1
[LuxRays][384.812] Device 3 max allocable memory: 8192MBytes
[LuxRays][384.812] Device 3 max allocable memory block size: 17592186044415MBytes
[LuxRays][384.812] Device 3 has out of core memory support: 1
[LuxRays][384.812] Device 3 CUDA compute capability: 6.1
[LuxRays][384.828] Device 4 name: Quadro M5000
[LuxRays][384.828] Device 4 type: CUDA_GPU
[LuxRays][384.828] Device 4 compute units: 128
[LuxRays][384.828] Device 4 preferred float vector width: 1
[LuxRays][384.828] Device 4 max allocable memory: 8192MBytes
[LuxRays][384.828] Device 4 max allocable memory block size: 17592186044415MBytes
[LuxRays][384.828] Device 4 has out of core memory support: 1
[LuxRays][384.828] Device 4 CUDA compute capability: 5.2
[LuxRays][384.828] Creating 34 intersection device(s)
[LuxRays][384.828] Allocating intersection device 0: GeForce GTX 1070 (Type = CUDA_GPU)
[LuxRays][384.922] Allocating intersection device 1: Quadro M5000 (Type = CUDA_GPU)
[LuxRays][385.016] Allocating intersection device 2: Native (Type = NATIVE_THREAD)
[LuxRays][385.016] Allocating intersection device 3: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 4: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 5: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 6: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 7: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 8: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 9: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 10: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 11: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 12: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 13: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 14: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 15: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 16: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 17: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 18: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 19: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 20: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 21: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 22: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 23: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 24: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 25: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 26: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 27: Native (Type = NATIVE_THREAD)
[LuxRays][385.031] Allocating intersection device 28: Native (Type = NATIVE_THREAD)
[LuxRays][385.047] Allocating intersection device 29: Native (Type = NATIVE_THREAD)
[LuxRays][385.047] Allocating intersection device 30: Native (Type = NATIVE_THREAD)
[LuxRays][385.047] Allocating intersection device 31: Native (Type = NATIVE_THREAD)
[LuxRays][385.047] Allocating intersection device 32: Native (Type = NATIVE_THREAD)
[LuxRays][385.047] Allocating intersection device 33: Native (Type = NATIVE_THREAD)
[LuxCore][385.047] CUDA devices used:
[LuxCore][385.047] [GeForce GTX 1070 CUDAIntersect (Optix enabled: 0)]
[LuxCore][385.047] [Quadro M5000 CUDAIntersect (Optix enabled: 0)]
[LuxCore][385.047] OpenCL devices used:
[LuxCore][385.047] Native devices used: 32
[LuxCore][385.047] Configuring 2 OpenCL render threads
[LuxCore][385.047] Configuring 32 native render threads
[LuxRays][385.047] Preprocessing DataSet
[LuxRays][385.047] Total vertex count: 48
[LuxRays][385.047] Total triangle count: 24
[LuxRays][385.047] Preprocessing DataSet done
[LuxRays][385.047] Adding DataSet accelerator: BVH
[LuxRays][385.047] Total vertex count: 48
[LuxRays][385.047] Total triangle count: 24
[LuxRays][385.047] BVH Dataset preprocessing time: 0ms
[LuxRays][385.047] BVH builder: EMBREE_BINNED_SAH
[LuxRays][385.062] BVH build hierarchy time: 15ms
[LuxRays][385.062] BVH total build time: 15ms
[LuxRays][385.062] Total BVH memory usage: 1Kbytes
[LuxRays][385.062] Adding DataSet accelerator: EMBREE
[LuxRays][385.062] Total vertex count: 48
[LuxRays][385.062] Total triangle count: 24
[LuxRays][385.062] EmbreeAccel build time: 0ms
[LuxRays][385.062] [Device GeForce GTX 1070 CUDAIntersect] BVH mesh vertices buffer size: 576bytes
[LuxRays][385.062] [Device GeForce GTX 1070 CUDAIntersect] BVH nodes buffer size: 1248bytes
[LuxRays][385.062] [BVHKernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][385.062] [BVHKernel] Compiling kernels
[LuxRays][385.062] [BVHKernel] Program cached
[LuxRays][385.062] [Device Quadro M5000 CUDAIntersect] BVH mesh vertices buffer size: 576bytes
[LuxRays][385.062] [Device Quadro M5000 CUDAIntersect] BVH nodes buffer size: 1248bytes
[LuxRays][385.062] [BVHKernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][385.062] [BVHKernel] Compiling kernels
[LuxRays][385.078] [BVHKernel] Program cached
[LuxCore][385.109] Tiles initialization time: 0.03 secs
[LuxCore][385.109] [PathOCLBaseRenderEngine] OpenCL max. page memory size: 18014398509481983Kbytes
[LuxCore][385.109] Compile Geometry
[LuxCore][385.109] Scene geometry compilation time: 0ms
[LuxCore][385.109] Compile 2 Textures
[LuxCore][385.109] Texture evaluation ops count: 6
[LuxCore][385.109] Texture evaluation max. stack size: 3
[LuxCore][385.109] Textures compilation time: 0ms
[LuxCore][385.109] Compile 2 Materials
[LuxCore][385.109] Material evaluation ops count: 14
[LuxCore][385.109] Material evaluation max. stack size: 8
[LuxCore][385.109] Material compilation time: 0ms
[LuxCore][385.109] Compile Lights
[LuxCore][385.109] Lights compilation time: 0ms
[LuxCore][385.109] Compile ImageMaps
[LuxCore][385.125] Image maps page(s) count: 1
[LuxCore][385.125]  RGB channel page 0 size: 3264Kbytes
[LuxCore][385.125] Image maps compilation time: 15ms
[LuxCore][385.125] Always enabled OpenCL code:
[LuxCore][385.125] Compile Geometry
[LuxCore][385.125] Scene geometry compilation time: 0ms
[LuxCore][385.125] Compile 2 Textures
[LuxCore][385.125] Texture evaluation ops count: 6
[LuxCore][385.125] Texture evaluation max. stack size: 3
[LuxCore][385.125] Textures compilation time: 0ms
[LuxCore][385.125] Compile 2 Materials
[LuxCore][385.125] Material evaluation ops count: 14
[LuxCore][385.125] Material evaluation max. stack size: 8
[LuxCore][385.125] Material compilation time: 0ms
[LuxCore][385.125] Compile Lights
[LuxCore][385.125] Lights compilation time: 0ms
[LuxCore][385.125] Compile ImageMaps
[LuxCore][385.125] Image maps page(s) count: 1
[LuxCore][385.125]  RGB channel page 0 size: 3264Kbytes
[LuxCore][385.125] Image maps compilation time: 0ms
[LuxCore][385.125] Starting 2 OpenCL render threads
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Camera buffer size: 5492bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Triangle normals buffer size: 288bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Vertices buffer size: 576bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Triangles buffer size: 288bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Mesh description buffer size: 624bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] ImageMap descriptions buffer size: 80bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] ImageMaps buffer size: 3264Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Textures buffer size: 656bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Texture evaluation ops buffer size: 48bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Texture evaluation stacks buffer size: 192Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Materials buffer size: 456bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Material evaluation ops buffer size: 168bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Material evaluation stacks buffer size: 512Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Scene objects buffer size: 48bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Lights buffer size: 344bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Env. light indices buffer size: 4bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Light offsets (Part I) buffer size: 8bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Env. light distributions buffer size: 516Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] LightsDistribution buffer size: 16bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] InfiniteLightSourcesDistribution buffer size: 16bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] Ray buffer size: 768Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] RayHit buffer size: 320Kbytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] GPUTaskConfiguration buffer size: 336bytes
[LuxRays][385.141] [Device GeForce GTX 1070 CUDAIntersect] GPUTask buffer size: 10624Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] GPUTaskDirectLight buffer size: 960Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] GPUTaskState buffer size: 6272Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] GPUTask Stats buffer size: 64Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] SamplerSharedData buffer size: 8740bytes
[LuxCore][385.156] [PathOCLBaseRenderThread::0] Size of a Sample: 16bytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] Sample buffer size: 256Kbytes
[LuxCore][385.156] [PathOCLBaseRenderThread::0] Size of a SampleData: 8bytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] SampleData buffer size: 128Kbytes
[LuxCore][385.156] [PathOCLBaseRenderThread::0] Size of a SampleResult: 428bytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] SampleResult buffer size: 6848Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] PathInfo buffer size: 1728Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] DirectLightVolumeInfo buffer size: 704Kbytes
[LuxRays][385.156] [Device GeForce GTX 1070 CUDAIntersect] Pixel Filter Distribution buffer size: 33Kbytes
[LuxCore][385.156] [PathOCLBaseRenderThread::0] Compiling kernels
[LuxRays][385.156] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_TILEPATHOCL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][385.156] [PathOCL kernel] Compiling kernels
[LuxRays][570.578] [PathOCL kernel] Program not cached
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling Film_Clear Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling InitSeed Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling Init Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_RT_NEXT_VERTEX Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_HIT_NOTHING Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_HIT_OBJECT Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_RT_DL Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_DL_ILLUMINATE Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_DL_SAMPLE_BSDF Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_GENERATE_NEXT_VERTEX_RAY Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_SPLAT_SAMPLE Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_NEXT_SAMPLE Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Compiling AdvancePaths_MK_GENERATE_CAMERA_RAY Kernel
[LuxCore][570.594] [PathOCLBaseRenderThread::0] AdvancePaths_MK_* workgroup size: 32
[LuxCore][570.594] [PathOCLBaseRenderThread::0] Kernels compilation time: 185438ms
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] Camera buffer size: 5492bytes
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] Triangle normals buffer size: 288bytes
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] Vertices buffer size: 576bytes
[LuxRays][570.594] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] Triangles buffer size: 288bytes
[LuxCore][570.594] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 2
[LuxRays][570.594] [Device Quadro M5000 CUDAIntersect] Mesh description buffer size: 624bytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] ImageMap descriptions buffer size: 80bytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] ImageMaps buffer size: 3264Kbytes
[LuxRays][570.609] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.609] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 3
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Textures buffer size: 656bytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Texture evaluation ops buffer size: 48bytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Texture evaluation stacks buffer size: 192Kbytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Materials buffer size: 456bytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Material evaluation ops buffer size: 168bytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Material evaluation stacks buffer size: 512Kbytes
[LuxRays][570.609] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][570.609] [Device Quadro M5000 CUDAIntersect] Scene objects buffer size: 48bytes
[LuxCore][570.609] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 4
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] Lights buffer size: 344bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] Env. light indices buffer size: 4bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] Light offsets (Part I) buffer size: 8bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] Env. light distributions buffer size: 516Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] LightsDistribution buffer size: 16bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] InfiniteLightSourcesDistribution buffer size: 16bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] Ray buffer size: 768Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] RayHit buffer size: 320Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] GPUTaskConfiguration buffer size: 336bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] GPUTask buffer size: 10624Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] GPUTaskDirectLight buffer size: 960Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] GPUTaskState buffer size: 6272Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] GPUTask Stats buffer size: 64Kbytes
[LuxRays][570.625] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] SamplerSharedData buffer size: 8740bytes
[LuxCore][570.625] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 5
[LuxCore][570.625] [PathOCLBaseRenderThread::1] Size of a Sample: 16bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] Sample buffer size: 256Kbytes
[LuxCore][570.625] [PathOCLBaseRenderThread::1] Size of a SampleData: 8bytes
[LuxRays][570.625] [Device Quadro M5000 CUDAIntersect] SampleData buffer size: 128Kbytes
[LuxCore][570.625] [PathOCLBaseRenderThread::1] Size of a SampleResult: 428bytes
[LuxRays][570.641] [Device Quadro M5000 CUDAIntersect] SampleResult buffer size: 6848Kbytes
[LuxRays][570.641] [Device Quadro M5000 CUDAIntersect] PathInfo buffer size: 1728Kbytes
[LuxRays][570.641] [Device Quadro M5000 CUDAIntersect] DirectLightVolumeInfo buffer size: 704Kbytes
[LuxRays][570.641] [Device Quadro M5000 CUDAIntersect] Pixel Filter Distribution buffer size: 33Kbytes
[LuxRays][570.641] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.641] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 6
[LuxCore][570.641] [PathOCLBaseRenderThread::1] Compiling kernels
[LuxRays][570.641] [PathOCL kernel] Compiler options: -D LUXRAYS_OPENCL_KERNEL -D SLG_OPENCL_KERNEL -D RENDER_ENGINE_TILEPATHOCL -D PARAM_RAY_EPSILON_MIN=1e-05f -D PARAM_RAY_EPSILON_MAX=0.1f -D LUXRAYS_CUDA_DEVICE -D LUXRAYS_OS_WINDOWS --use_fast_math
[LuxRays][570.641] [PathOCL kernel] Compiling kernels
[LuxRays][570.656] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.656] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 7
[LuxRays][570.672] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.672] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 8
[LuxRays][570.687] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.687] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 9
[LuxRays][570.703] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.703] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 10
[LuxRays][570.734] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.734] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 11
[LuxRays][570.750] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.750] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 12
[LuxRays][570.781] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.781] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 13
[LuxRays][570.812] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.812] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 14
[LuxRays][570.828] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.828] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 15
[LuxRays][570.859] [Device GeForce GTX 1070 CUDAIntersect] RADIANCE_PER_PIXEL_NORMALIZEDs[0] buffer size: 16Kbytes
[LuxCore][570.859] [TilePathOCLRenderThread::0] Increased the number of rendered tiles to: 16
[LuxCore][575.812] Rendering time: 190.70 secs

User avatar
Dade
Developer
Posts: 5423
Joined: Mon Dec 04, 2017 8:36 pm
Location: Italy

Re: Loading kernels tooo slow

Post by Dade » Sat Jul 24, 2021 12:46 am

kintuX wrote:
Fri Jul 23, 2021 10:01 pm
Dade wrote:
Thu Jul 22, 2021 9:02 am
kintuX wrote:
Sun Jul 18, 2021 12:28 pm
Ran it & does Path OCL, so when I start with Blender it does it all over again for CUDA :?
Uh ? No, it uses the default devices: if you have CUDA, they are all CUDA devices (otherwise all OpenCL devices) :?:
It runs PathOCL anyways.
PATHOCL works with OpenCL and CUDA devices. Just check the log you posted:

Code: Select all

>luxcoreui.exe
LuxCoreUI v2.6alpha0 (LuxCore demo: http://www.luxcorerender.org)
====================================================================
[...]
[LuxCore][14.344] CUDA devices used:
[LuxCore][14.344] [GeForce GTX 1070 CUDAIntersect (Optix enabled: 0)]
[LuxCore][14.344] [Quadro M5000 CUDAIntersect (Optix enabled: 0)]
[...]
[/quote]
Support LuxCoreRender project with salts and bounties

User avatar
B.Y.O.B.
Developer
Posts: 4114
Joined: Mon Dec 04, 2017 10:08 pm
Location: Germany
Contact:

Re: Loading kernels tooo slow

Post by B.Y.O.B. » Sat Jul 24, 2021 10:11 am

kintuX, can you tell us the reason why you don't simply do what I suggested earlier?
B.Y.O.B. wrote:
Tue Jul 06, 2021 10:18 pm
kintuX wrote:
Tue Jul 06, 2021 7:21 am
Could then be possible to just have a command for load/compile shaders using single GPU w/o running Blender and rendering a scene.
I don't really see an advantage that would have over opening a simple cube scene and pressing render to trigger the kernel compilation.

Post Reply