Optimizing NVidia kernel compile ( linux + macOS builds in testing )
-
- Supporting Users
- Posts: 141
- Joined: Tue Jan 09, 2018 6:48 pm
Re: Optimizing NVidia kernel compile
I post here a test pyluxcore.so (OS: GNU/Linux, Architecture: x86_64, GLIBC: 2.19 +ssse3 ) if someone like todo a quick test.
www.jensverwiebe.de/LuxRender/Lux_dev_b ... .so.tar.xz
88235c3961d5a43d3767dd993fd315021a7ce41330fdb5a26a25afadc0bd51b0
I covered testwise all procedurals with color/bump functions. Typically compiles kernel > 1000% faster, some 10000%
The main concern is if NVidia kernel compiles are now acceptable in most cases, but would be interesting how this plays on AMD too.
( to decide the later implementation, explanation in post before this one )
Edit: fixed a typo in luxcore own tex, new hash. + imagetex fix
Jens
www.jensverwiebe.de/LuxRender/Lux_dev_b ... .so.tar.xz
88235c3961d5a43d3767dd993fd315021a7ce41330fdb5a26a25afadc0bd51b0
I covered testwise all procedurals with color/bump functions. Typically compiles kernel > 1000% faster, some 10000%
The main concern is if NVidia kernel compiles are now acceptable in most cases, but would be interesting how this plays on AMD too.
( to decide the later implementation, explanation in post before this one )
Edit: fixed a typo in luxcore own tex, new hash. + imagetex fix
Jens
Last edited by jensverwiebe on Mon Feb 12, 2018 4:31 pm, edited 1 time in total.
Re: Optimizing NVidia kernel compile
I'm able to replicate this result. I have applied the "__attribute__((noinline))" in 3 strategic places to affect all Material/Texture/Bump mapping. It works well in all simple scenes I have tried. However it throws a meaningless error when trying to render LuxMark Mic and Hotel scenes (https://github.com/LuxCoreRender/LuxCor ... ter/scenes). This is clearly an NVIDIA bug and it requires a work around to a workaroundjensverwiebe wrote: ↑Sat Feb 10, 2018 8:02 pm @Dade: i worked on the scene you used as testcase ( proctexball-mix ):
[LuxCore][4.977] [PathOCLBaseRenderThread::0] Kernels compilation time: 3681ms ( was 43164 ms )Code: Select all
./bin/luxcoreui -D opencl.devices.select 100 -D renderengine.type PATHOCL -D sampler.type SOBOL scenes/luxball/proctexball-mix.cfg
This time i made sure some is not inlined
The modified code is available on the on "nvidia_opt_compile_time" branch, the patch is: https://github.com/LuxCoreRender/LuxCor ... 7c74fd4946
-
- Supporting Users
- Posts: 141
- Joined: Tue Jan 09, 2018 6:48 pm
Re: Optimizing NVidia kernel compile
outdated
Last edited by jensverwiebe on Fri Mar 09, 2018 8:53 pm, edited 1 time in total.
Re: Optimizing NVidia kernel compile
You have to install git LFS (https://git-lfs.github.com/) to be able to download large binary files from GitHub. If you open .exr files you will see they are text files with the address of the file to download.jensverwiebe wrote: ↑Sun Feb 11, 2018 11:43 am Would have to take a look at mic and hotel.
I only tested luxcore repo scenes as well as an own scene with all blender proc tex applied to color and bump.
EDIT: no chance to test LuxCoreTestScenes. All exr but 00000.exr seem broken for me
Only works from luxmark ? i darkly remember hashing against fraud ....
I have found a combination that still reduce the compile times but doesn't crash the NVIDIA drivers.
However the LuxMark Mic scene is still pure kryptonite for NVIDIA compiler, it still requires 90secs for compilation (all other scenes I have tested are under 10secs).
I have merged "nvidia_opt_compile_time" branch with the main branch.
-
- Supporting Users
- Posts: 141
- Joined: Tue Jan 09, 2018 6:48 pm
Re: Optimizing NVidia kernel compile
Testing this more next weekend.
You see digging was worth it. And thanks for the credits
As for the mic scene, we will see, i still have some ideas in back.
Edit: just did a quick test and found my own implementation way faster. I did muchh more finetuning.
As is there are still too many "hangups" in the repo implementation.
Just have to solve one cl_enqueue error in my stuff, seems i have overdone for images.(solved)
Jens
You see digging was worth it. And thanks for the credits
As for the mic scene, we will see, i still have some ideas in back.
Edit: just did a quick test and found my own implementation way faster. I did muchh more finetuning.
As is there are still too many "hangups" in the repo implementation.
Just have to solve one cl_enqueue error in my stuff, seems i have overdone for images.(solved)
Jens
Last edited by jensverwiebe on Sun Feb 11, 2018 5:57 pm, edited 9 times in total.
Re: Optimizing NVidia kernel compile
like this part green company on green toxic substanceHowever the LuxMark Mic scene is still pure kryptonite for NVIDIA compiler, it still requires 90secs for compilation (all other scenes I have tested are under 10secs).
-
- Supporting Users
- Posts: 141
- Joined: Tue Jan 09, 2018 6:48 pm
Re: Optimizing NVidia kernel compile
outdated
Last edited by jensverwiebe on Fri Mar 09, 2018 8:52 pm, edited 1 time in total.
-
- Supporting Users
- Posts: 141
- Joined: Tue Jan 09, 2018 6:48 pm
Re: Optimizing NVidia kernel compile
outdated
Last edited by jensverwiebe on Fri Mar 09, 2018 8:52 pm, edited 2 times in total.
-
- Supporting Users
- Posts: 141
- Joined: Tue Jan 09, 2018 6:48 pm
Re: Optimizing NVidia kernel compile
outdated
Last edited by jensverwiebe on Fri Mar 09, 2018 8:51 pm, edited 1 time in total.