Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Identical LLVM IR file with different basic block ordering cause miscompilation #109391

Open
vchuravy opened this issue Sep 20, 2024 · 7 comments

Comments

@vchuravy
Copy link
Contributor

Reduced from JuliaGPU/AMDGPU.jl#672 (comment)

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2024

@llvm/issue-subscribers-backend-amdgpu

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2024

@llvm/issue-subscribers-julialang

Author: Valentin Churavy (vchuravy)

Reduced from https://github.com/JuliaGPU/AMDGPU.jl/issues/672#issuecomment-2347151487

The code is a double-nested loops and the bug manifests as if we were skipping a loop.
In one version of the code after optimization (specifically DCE) the basic blocks end up in a different order.

I have two small LLVM modules that are identical, except that in one I manually reorder the BBs to follow the order of the "working" version.

https://godbolt.org/z/sscdTK7d7

https://gist.github.com/vchuravy/0c60cf4b9c497f6c8050f2a1137cd399

llc -filetype=asm broken.reorder.ll -o broken.reorder.S
llc -filetype=asm broken.ll -o broken.S

broken.ll is the original file that is exhibiting the miscompiation and broken.reorder.ll is the file that emits the same code as the working MWE.

Lastly, I encountered this on LLVM 15, and it also reproduces on LLVM 16. LLVM 17 either hides or has this bug fixed.

@arsenm
Copy link
Contributor

arsenm commented Sep 20, 2024

Does this reproduce with main? I see no difference in 17.0 or later: https://godbolt.org/z/Es9jdoYsd

I'm guessing this is a gfx11 specific issue. Newer versions don't have the s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) at the end, so this might just be one of the early deallocation bugs that was fixed.

cc @jayfoad

@vchuravy
Copy link
Contributor Author

The original reporter had a gfx1102 and I reproduced it on a gfx1103.

As far as I can tell this doesn't reproduce on main. I hadn't had the time to bisect. This is on an LTS release for us, and I would like to Backports the fix if possible.
So if you have a hint which commit might have fixed this that would be great!

@arsenm
Copy link
Contributor

arsenm commented Sep 24, 2024

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

@jayfoad
Copy link
Contributor

jayfoad commented Sep 24, 2024

eb74917 reimplemented the whole thing, but I don't have all the detailed context (@jayfoad ?)

The reason for reimplementing it was as a basis for implementing this fairly important bug fix: 4b6d41c

But from reading the description above ("the bug manifests as if we were skipping a loop") it's not clear to me how this is related to dealloc vgprs.

@arsenm
Copy link
Contributor

arsenm commented Sep 24, 2024

it's not clear to me how this is related to dealloc vgprs.

I just noticed in the newer versions, there is no more dealloc. The only difference between the pass/fail in the 16 diff was a few register assignments and one s_delay_alu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants