-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openmpi bugs #31
Openmpi bugs #31
Conversation
This reverts commit 69bb270.
A crash seemed to occur in CI during garbage collection julia:3364 terminated with signal 11 at PC=7f997aaec971 SP=7f9936ccb970. Backtrace: /opt/hostedtoolcache/julia/1.8.5/x64/bin/../lib/julia/libjulia-internal.so.1(ijl_gc_safepoint+0x11)[0x7f997aaec971] This looks like a crash during garbage collection code (this would be consistent with the fact that it only shows up with allocation-heavy test, i.e. Turing). So even though the finalizer checks first before calling free, maybe there is something flaky there. #30 (comment)
This might be related: JuliaPlots/Plots.jl#3583 Possibly the problem is when JJWrappers prepares the mpiexec environment |
Close inspection of a failing runtest.jl based on a system MPI revealed the following:
As soon as these additional environment variables are set, it seems that calling mpiexec no longer works (returns the uninformative pipeline_error). Frustratingly, it is not clear at all what makes that modification. Probably some hack in MPIPreferences/JLLWrapper/something else? |
That is very interesting! How did you figure this out!? Can you highlight the extra ENV variables that appear? |
The one included are just the additional ones. |
Yikes... |
Those issues were later on fixed in #30 |
Potentially a new bug that showed up on Saturday after all tests reintroduced. Maybe same as JuliaParallel/MPI.jl#725 but not sure yet.