-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault fix #30
Segfault fix #30
Conversation
But one of the crashes "reproduces" the segfault. So there seems to be at least two issues (not suggesting they are related)
|
|
One hypothesis: pmodels/mpich#6432 (comment) |
src/mpi_utils/Entangler.jl
Outdated
@@ -105,7 +105,7 @@ mpi_active() = | |||
Comm_size(COMM_WORLD) > 1 | |||
end | |||
|
|||
init_mpi() = Init() #threadlevel = :funneled) | |||
init_mpi() = Init(threadlevel = :funneled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As troubleshooting, you should try MPI_THREAD_MULTIPLE
. The default is thread-single, which is nearly the same as thread-funneled. But if you have race condition into MPI, then you need thread-multiple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! Unfortunately, the segfault still arise with threadlevel = :multiple
. On the positive side, I am now able to reproduce the problem locally.
Can reproduce the crash locally now with e.g.:
|
I also get segfault locally with the example above. julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e (2023-01-08 06:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores |
Created branch https://github.com/Julia-Tempering/Pigeons.jl/tree/investigate-segfault-openmpi-hack to make local tests on OpenMPI. Instructions: check out that branch, then
|
The problem does not arise with Open MPI, confirming this is likely MPICH related. (with open MPI an unrelated warning message pops up but does not cause a crash, it is documented in open-mpi/ompi#7393 and the fix discussed there removes the warning message). |
Next step on this will be to build a local MPICH with debug symbols on (see JuliaParallel/MPI.jl#720 which also provides a nice template for testing several mpi systems in CI). Can follow the Yggdrasil script to help with that and also to check if the problem also arises with ch4: https://github.com/JuliaPackaging/Yggdrasil/blob/master/M/MPICH/build_tarballs.jl#L74 |
Before this, we had crashed such as in 18ef655 Here is what was happening in these crash 1. parent process is using a system library 2. parent process' test first call local pigeons 3. that in turns called mpi_active() which internally used MPI.Init() to see if Comm_size > 1 4. that had the side effect of changing ENV 5. then, when calling pigeons MPI tests, these ENV variables were passed to ChildProcess, causing problems The new approach avoids to call MPI.Init() frivolously
Post-resolution analysis/summary.
The bugs encountered and addressed here fall into two categories:
B1. There was a new released version of Turing breaking tests (potentially just for Julia nightly?)
B2. Various subtle and non-deterministic MPI bugs, e.g. related to interaction of GC and MPI
By coincidence, CI started failing with both B1 and B2 around the same time, in the first case, because of a new Turing release, in the second, by chance since it occurs non-deterministically. For B1, we simply removed Turing from test dependencies; all Turing integration correctness checks can be done with DynamicPPL only it turns out after some changes in the Turing test code.
After solving a first category B2 bug, we added more MPI implementations in our set of tests, revealing more B2-type bugs. The full set discovered here is:
B2a. All request objects returned by MPI.isend, MPI.Isend, MPI.isend, etc should be captured and either MPI.free()'ed or WaitXXX()'ed. Otherwise, we rely on Julia's GC to tell the MPI C code to free request object, which there are only 2^16 of in MPICH. Eventually, we hit situations where GC does not occur early enough, especially in non-allocating or toy examples. Since MPICH does not perform bound checks, it does not provide a descriptive error and instead segfaults. Moreover, since MPI.jl's built-in test mpiexecs are compiled without debug info, it is difficult to diagnosis such issue. See pmodels/mpich#6432 (comment) for more information.
B2b. A distinct MPI + Julia GC-related issue popped up: according to JuliaParallel/MPI.jl#337, Julia "hijacks" the sigsegv signal (segmentation fault) to synchronize threads pausing for Julia's stop-the-world GC. The problem is that some MPI implementations' mpiexec parent process intercept that sigsegv signal and trigger a global crash whenever Julia GC occurs in a multithreaded context. Moreover, the described error, "segmentation fault", has nothing to do with the underlying problem (for us, the fact that B2b popped up just after solving a genuine segfault, B2a---literally minutes after---made this especially confusing!). For some MPI implementations, this undesirable interference between Julia GC and mpiexec can be worked around (JuliaParallel/MPI.jl#337), but for others no workaround are currently known (e.g. at least some Intel MPI versions, which we reported to MPI.jl, JuliaParallel/MPI.jl#725 but also at least one version of OpenMPI, i.e. OpenMPI4.0 while OpenMPI4.1 is ok. Such unresolved issues have been reported before by others, e.g. https://discourse.julialang.org/t/julia-crashes-inside-threads-with-mpi/52400/5). In summary, B2b is due to a Julia GC design decision, so the best we can do is in the future print a nicer error message. The current one is very cryptic. The good news is that for the vast majority of main-stream MPI implementations, this can be worked around by telling MPI to ignore sigsegv. If one is stuck with a closed source MPI without workaround, one can always either use single-thread, or stop GC. See #32 for more detailed proposed enhancement regarding the error messages.
B2c. To test some OpenMPI implementations, the "--oversubscribe" flag should be added to mpiexec. Moreover the error message explaining this is in some circumstances "eaten up" leaving only a nondescript Julia pipeline_error stack trace. We now automatically add that flag in our test cases.
B2d. A subtle problem due to MPI.Init() silently changing ENV variables. In specific circumstances this causes a crash: 1. parent process is using a system library. 2. parent process' test first call local pigeons. 3. that in turns called mpi_active() which internally used MPI.Init() to see if Comm_size > 1. 4. that had the side effect of changing ENV. 5. then, when calling pigeons MPI tests, these ENV variables were passed to ChildProcess, prevening mpiexec to start and resulting in a cryptic pipeline_error message without any details in the chain of events 1-5. The new approach calls MPI.Init() only when in the context of running under MPI.