Issues Building on MacOS Big Sur (11.4)

danielbui78 · Post by **danielbui78** » Wed Nov 17, 2021 12:29 pm

u3dreal wrote: ↑Mon Nov 15, 2021 2:29 pm
danielbui78 wrote: ↑Mon Nov 15, 2021 12:26 pm FYI, I think Big Sur 11.6 did break some OpenCL support (at least for HD4000) with LuxCore 2.5 officially distributed binaries. LuxCore 2.6 still works.
I'm still baffeled how random it works or not. Every update is russian roulette.

Update: I figured out my OpenCL issues with 11.6: If something triggers a kernel recompilation, LuxCore will fail unless the kernel cache is manually deleted. Maybe this problem was present for me in 11.4 as well but I never noticed it because I never made configuration changes requiring kernel recompilation.

In other words: in order for me to get kernel recompilation to work in 11.6, I must go into the kernel cache folder for the appropriate device (~/luxcorerener.org/ocl_kernel_cache/....) and do `rm *.ocl`. NOTE: this only relates to runtime OpenCL error messages during kernel compilation, I still have device-specific system crashes due to run-away kernel compilation process eating up 80+ GB of RAM.

Just to clarify, my previous statement was wrong: LuxCore 2.5 + Big Sur 11.6 + HD4000 is working, at least for imagepipeline OCL -- as long as you manually delete *.ocl files.

Post by **Dade** » Wed Nov 17, 2021 12:38 pm

danielbui78 wrote: ↑Wed Nov 17, 2021 12:29 pm In other words: in order for me to get kernel recompilation to work in 11.6, I must go into the kernel cache folder for the appropriate device (~/luxcorerener.org/ocl_kernel_cache/....) and do `rm *.ocl`

It is sounds like the (driver) bug is related to returning (or reading back) the binaries of compiled kernel binary.

danielbui78 · Post by **danielbui78** » Wed Nov 17, 2021 1:29 pm

Dade wrote: ↑Wed Nov 17, 2021 12:38 pm
danielbui78 wrote: ↑Wed Nov 17, 2021 12:29 pm In other words: in order for me to get kernel recompilation to work in 11.6, I must go into the kernel cache folder for the appropriate device (~/luxcorerener.org/ocl_kernel_cache/....) and do `rm *.ocl`
It is sounds like the (driver) bug is related to returning (or reading back) the binaries of compiled kernel binary.

Thanks for the tip. I'm looking at the kernel cache's filepath/hash algorithm: cached kernels are segregated into folders by vendor and device name, then a filename hash is created using the compiler options + kernel source code. However, I do not see work_group size being included in the compiler options list that is hashed. This is the parameter that I am changing in the cfg file: opencl.gpu.workgroup.size which is causing the OpenCL error requiring me to delete the *.ocl files.

Does workgroup size need to be added to the compiler options hash? Also, if the device vendor changes the default workgroup size in a dirver update, do we need to catch this and force kernel recompilation?

Post by **Dade** » Wed Nov 17, 2021 2:29 pm

danielbui78 wrote: ↑Wed Nov 17, 2021 1:29 pm Does workgroup size need to be added to the compiler options hash? Also, if the device vendor changes the default workgroup size in a dirver update, do we need to catch this and force kernel recompilation?

As far as I know it is not a kernel compiler option so it shouldn't be included in the hash.

Group size is pretty much an hardware characteristic, it has always been 32 for NVIDIA GPUs and was 64 for old AMD GPUs but it is now 32 for AMD too.

What value are you using ? (and why are you changing it ? on what GPU ?)

danielbui78 · Post by **danielbui78** » Wed Nov 17, 2021 3:05 pm

Dade wrote: ↑Wed Nov 17, 2021 2:29 pm
danielbui78 wrote: ↑Wed Nov 17, 2021 1:29 pm Does workgroup size need to be added to the compiler options hash? Also, if the device vendor changes the default workgroup size in a dirver update, do we need to catch this and force kernel recompilation?
As far as I know it is not a kernel compiler option so it shouldn't be included in the hash.

Group size is pretty much an hardware characteristic, it has always been 32 for NVIDIA GPUs and was 64 for old AMD GPUs but it is now 32 for AMD too.

What value are you using ? (and why are you changing it ? on what GPU ?)

I would agree that MAXIMUM work-group size is a hardware characteristic, but the actual work-group size can be variable. I am using the Intel OpenCL CPU driver as well as the Intel HD4000 opencl driver, which is used by default on my MacBook Air for imagepipeline opencl acceleration. Previously, I had to manually set opencl.cpu.workgroup = 32 for the Intel OpenCL CPU driver to work. This appears to now cause an OpenCL error with Big Sur (at least for 11.6): CL_INVALID_WORK_GROUP_SIZE. After trial and error, it appears the only value that now works is opencl.cpu.workgroup = 1 (at least for 11.6).

While trying to figure out this value as well as when just upgrading from Big Sur 11.4 to 11.6, it was necessary to remove previous *.ocl files to avoid CL_BUILD_PROGRAM_FAILURE or CL_INVALID_VALUE errors after the call to clBuildProgram() on line 373 of ocl.cpp, and its corresponding CHECK_OCL_ERROR() call on line 374.

Post by **u3dreal** » Wed Nov 17, 2021 6:27 pm

MCurto just pointed out that the CL_INVALID_VALUE comes from an old kernel cache ... after deleting it things works now on 11.6.1.
At least for the image pipeline.

I also have the same here

CL_BUILD_PROGRAM_FAILURE

but deleteing does not help and brings up the same error after 20min.
Could setting

Code: Select all

opencl.cpu.workgroup = 1

here too ??

or did you mean

Code: Select all

opencl.gpu.workgroup = 1

CPU or GPU ?
I have GT750m and Iris Pro ..Iris has not worked for a year. 750m has stopped working some versions ago.

Interesting you got the intel driver to work..
Quoting Dade

We don't touch it ... not even with a 2 meter long stick

danielbui78 · Post by **danielbui78** » Wed Nov 17, 2021 8:28 pm

u3dreal wrote: ↑Wed Nov 17, 2021 6:27 pm Interesting you got the intel driver to work..

Sorry for confusion: I only got Intel OpenCL CPU driver fully working for general rendering -- not the OpenCL GPU (HD4000) driver. After I do the workaround with `rm *.ocl`, the Intel OpenCL GPU driver will start compiling the kernel for general rendering, but will start using 40+ gigabytes of ram (my macbook air only has 8 gigs of physical ram), run out of swap space and then effectively crash MacOS. This happens for me on AMD RX 580 + LuxCore 2.6 as well (system crashing after 80+gb ram usage on my desktop with 32gb ram), but AMD RX 580 + LuxCore 2.5 is able to finish kernel compiling and work, although with artifacts.

However, HD4000 (OpenCL GPU) is working for the imagepipeline operations.

Post by **u3dreal** » Wed Nov 17, 2021 10:31 pm

Ah OK same here for Iris Pro and GT750m imagepipeline works.
Strange you have problems with the 580.
I run a rx5700 and it works fine for now.

Thanks for clearing things up.

LuxCoreRender Forums

Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)

Re: Issues Building on MacOS Big Sur (11.4)