[[clang::optnone]]
causes miscompiles
#2156
Labels
question
Further progress depends on answer from issue creator.
[[clang::optnone]]
causes miscompiles
#2156
This is mostly an Apple compiler bug, but we might want to consider avoiding using things that both aren't in the MSL spec and change lots of otherwise unused (and therefore probably completely untested) options in the compiler
To reproduce:
make -j2 run
The test will do a two-pass render, first with a depth-only pass and then with a compare zequal color pass that writes red to the output. It will then compare the two images and count the number of pixels that wrote depth but failed the zequal test. This should be zero, but on Apple GPUs, 2211 pixels will fail. You can go into
main.swift
and uncomment the commented lines to take a capture if you're curious (change the destination and add an output if you don't want to import this project into Xcode).The shaders VS0 and VS1 are lightly modified from spirv-cross, with as much
[[clang::optnone]]
usage as possible removed while still preserving the bug.I disassembled the compiled shaders to see why the bug happens, for anyone curious:
Shader disassembly and analysis
For anyone following along, apply this patch to the apple gpu disassembler and run
python3 compiler_explorer.py OptNoneFail/VS0.metal --no-fast-math
Full shader decompilation
GitHub seems to have a comment length limit of 65536 characters, so have a text file instead
The two outlined functions are as follows:
Looking at the area around the two function calls (since we know that's where things are breaking), you can see this:
The calling convention seems to put inputs and outputs starting at r11
Weirdly, offset 1740 seems to overwrite r11 (the first output of the first function call) without saving it first. Checking the shader, it is used:
r8.xy = spvONFAdd(r1.xy, -r8.xy); // much later... r0.z = dot(r8.xy, r8.xy);
r12 does get saved into r6, so if we search for the next use of that...
It seems the compiler thinks r8.x is in r5, which got zeroed in offset 174c
I guess something about the next function's input being zero confused it?
The text was updated successfully, but these errors were encountered: