Looking for help to identify a regression when using Clang-based OpenCL platform with LuxMark/LuxCore
Posted: Wed May 18, 2022 12:08 am
Hi, I found a bug in the `-ffast-math` option of the LLVM Clang compiler that is reproduced when compiling OpenCL code using LuxMark.
I'm looking for help to narrow the reproduction test with hope to get this Clang bug fixed.
Some context: The Mesa Clover radeonsi OpenCL driver for AMD GCN and later devices is relying on LLVM Clang compiler. I've noticed a regression when running LuxMark v3.1 LuxBall benchmark: it now render garbages with default OpenCL Compiler option. After some investigations, it appears the bug is not in Mesa but in LLVM and is reproduced once `-cl-fast-relaxed-math` is enabled. With with more investigations it appears the culprit is `-ffast-math` which looks to be automatically enabled when `-cl-fast-relaxed-math` is enabled: not enabling `-cl-fast-relaxed-math` but enabling `-ffast-math` reproduces the bug, and removing `-cl-fast-relaxed-math` dependency on `-ffast-math` workaround the bug.
One problem I face is that to reproduce the bug, one has to run the complete LuxMark v3.1 benchmark, this does not make easy to identify what's wrong.
Even LuxMark 3 is hard to build today (though I wrote a script to make it easy). This makes me think it would be good if newer LuxMark keep the LuxBall scene even if not the default one, just because it's a simple scene.
So I'm looking for help to narrow the reproduction test to help identify the bug in Clang. Here is the issue on Mesa side, but more important, here is the issue on LLVM side, this comment may be relevant.
Maybe it would be useful to identify which kernel is suffering from the bug, to begin with? I feel like I did the maximum I could and hit my limit and I don't know what to do myself to help more.
For information, this is how the bug looks like,
without `-ffast-math` (without `-cl-fast-relaxed-math`):
with `-ffast-math` (implied by default `-cl-fast-relaxed-math`):
I'm looking for help to narrow the reproduction test with hope to get this Clang bug fixed.
Some context: The Mesa Clover radeonsi OpenCL driver for AMD GCN and later devices is relying on LLVM Clang compiler. I've noticed a regression when running LuxMark v3.1 LuxBall benchmark: it now render garbages with default OpenCL Compiler option. After some investigations, it appears the bug is not in Mesa but in LLVM and is reproduced once `-cl-fast-relaxed-math` is enabled. With with more investigations it appears the culprit is `-ffast-math` which looks to be automatically enabled when `-cl-fast-relaxed-math` is enabled: not enabling `-cl-fast-relaxed-math` but enabling `-ffast-math` reproduces the bug, and removing `-cl-fast-relaxed-math` dependency on `-ffast-math` workaround the bug.
One problem I face is that to reproduce the bug, one has to run the complete LuxMark v3.1 benchmark, this does not make easy to identify what's wrong.
Even LuxMark 3 is hard to build today (though I wrote a script to make it easy). This makes me think it would be good if newer LuxMark keep the LuxBall scene even if not the default one, just because it's a simple scene.
So I'm looking for help to narrow the reproduction test to help identify the bug in Clang. Here is the issue on Mesa side, but more important, here is the issue on LLVM side, this comment may be relevant.
Maybe it would be useful to identify which kernel is suffering from the bug, to begin with? I feel like I did the maximum I could and hit my limit and I don't know what to do myself to help more.
For information, this is how the bug looks like,
without `-ffast-math` (without `-cl-fast-relaxed-math`):
with `-ffast-math` (implied by default `-cl-fast-relaxed-math`):