[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

vchuravy · 2024-09-20T08:51:48Z

Reduced from JuliaGPU/AMDGPU.jl#672 (comment)

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-09-20T08:52:03Z

@llvm/issue-subscribers-backend-amdgpu

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

llvmbot · 2024-09-20T08:52:04Z

@llvm/issue-subscribers-julialang

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

arsenm · 2024-09-20T20:00:55Z

Does this reproduce with main? I see no difference in 17.0 or later: https://godbolt.org/z/Es9jdoYsd

I'm guessing this is a gfx11 specific issue. Newer versions don't have the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) at the end, so this might just be one of the early deallocation bugs that was fixed.

cc @jayfoad

vchuravy · 2024-09-22T06:40:32Z

The original reporter had a gfx1102 and I reproduced it on a gfx1103.

As far as I can tell this doesn't reproduce on main. I hadn't had the time to bisect. This is on an LTS release for us, and I would like to Backports the fix if possible.
So if you have a hint which commit might have fixed this that would be great!

arsenm · 2024-09-24T12:25:59Z

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

jayfoad · 2024-09-24T12:47:14Z

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

The reason for reimplementing it was as a basis for implementing this fairly important bug fix: 4b6d41c

But from reading the description above ("the bug manifests as if we were skipping a loop") it's not clear to me how this is related to dealloc vgprs.

arsenm · 2024-09-24T12:51:33Z

it's not clear to me how this is related to dealloc vgprs.

I just noticed in the newer versions, there is no more dealloc. The only difference between the pass/fail in the 16 diff was a few register assignments and one s_delay_alu

vchuravy added backend:AMDGPU miscompilation julialang labels Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

vchuravy commented Sep 20, 2024

llvmbot commented Sep 20, 2024

llvmbot commented Sep 20, 2024

arsenm commented Sep 20, 2024 •

edited

Loading

vchuravy commented Sep 22, 2024

arsenm commented Sep 24, 2024

jayfoad commented Sep 24, 2024

arsenm commented Sep 24, 2024

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

Comments

vchuravy commented Sep 20, 2024

llvmbot commented Sep 20, 2024

llvmbot commented Sep 20, 2024

arsenm commented Sep 20, 2024 • edited Loading

vchuravy commented Sep 22, 2024

arsenm commented Sep 24, 2024

jayfoad commented Sep 24, 2024

arsenm commented Sep 24, 2024

arsenm commented Sep 20, 2024 •

edited

Loading