Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault fix #30

Merged
merged 65 commits into from
Mar 13, 2023
Merged

Segfault fix #30

merged 65 commits into from
Mar 13, 2023

Conversation

alexandrebouchard
Copy link
Member

@alexandrebouchard alexandrebouchard commented Mar 6, 2023

Post-resolution analysis/summary.

The bugs encountered and addressed here fall into two categories:

B1. There was a new released version of Turing breaking tests (potentially just for Julia nightly?)
B2. Various subtle and non-deterministic MPI bugs, e.g. related to interaction of GC and MPI

By coincidence, CI started failing with both B1 and B2 around the same time, in the first case, because of a new Turing release, in the second, by chance since it occurs non-deterministically. For B1, we simply removed Turing from test dependencies; all Turing integration correctness checks can be done with DynamicPPL only it turns out after some changes in the Turing test code.

After solving a first category B2 bug, we added more MPI implementations in our set of tests, revealing more B2-type bugs. The full set discovered here is:

B2a. All request objects returned by MPI.isend, MPI.Isend, MPI.isend, etc should be captured and either MPI.free()'ed or WaitXXX()'ed. Otherwise, we rely on Julia's GC to tell the MPI C code to free request object, which there are only 2^16 of in MPICH. Eventually, we hit situations where GC does not occur early enough, especially in non-allocating or toy examples. Since MPICH does not perform bound checks, it does not provide a descriptive error and instead segfaults. Moreover, since MPI.jl's built-in test mpiexecs are compiled without debug info, it is difficult to diagnosis such issue. See pmodels/mpich#6432 (comment) for more information.

B2b. A distinct MPI + Julia GC-related issue popped up: according to JuliaParallel/MPI.jl#337, Julia "hijacks" the sigsegv signal (segmentation fault) to synchronize threads pausing for Julia's stop-the-world GC. The problem is that some MPI implementations' mpiexec parent process intercept that sigsegv signal and trigger a global crash whenever Julia GC occurs in a multithreaded context. Moreover, the described error, "segmentation fault", has nothing to do with the underlying problem (for us, the fact that B2b popped up just after solving a genuine segfault, B2a---literally minutes after---made this especially confusing!). For some MPI implementations, this undesirable interference between Julia GC and mpiexec can be worked around (JuliaParallel/MPI.jl#337), but for others no workaround are currently known (e.g. at least some Intel MPI versions, which we reported to MPI.jl, JuliaParallel/MPI.jl#725 but also at least one version of OpenMPI, i.e. OpenMPI4.0 while OpenMPI4.1 is ok. Such unresolved issues have been reported before by others, e.g. https://discourse.julialang.org/t/julia-crashes-inside-threads-with-mpi/52400/5). In summary, B2b is due to a Julia GC design decision, so the best we can do is in the future print a nicer error message. The current one is very cryptic. The good news is that for the vast majority of main-stream MPI implementations, this can be worked around by telling MPI to ignore sigsegv. If one is stuck with a closed source MPI without workaround, one can always either use single-thread, or stop GC. See #32 for more detailed proposed enhancement regarding the error messages.

B2c. To test some OpenMPI implementations, the "--oversubscribe" flag should be added to mpiexec. Moreover the error message explaining this is in some circumstances "eaten up" leaving only a nondescript Julia pipeline_error stack trace. We now automatically add that flag in our test cases.

B2d. A subtle problem due to MPI.Init() silently changing ENV variables. In specific circumstances this causes a crash: 1. parent process is using a system library. 2. parent process' test first call local pigeons. 3. that in turns called mpi_active() which internally used MPI.Init() to see if Comm_size > 1. 4. that had the side effect of changing ENV. 5. then, when calling pigeons MPI tests, these ENV variables were passed to ChildProcess, prevening mpiexec to start and resulting in a cryptic pipeline_error message without any details in the chain of events 1-5. The new approach calls MPI.Init() only when in the context of running under MPI.

@alexandrebouchard
Copy link
Member Author

alexandrebouchard commented Mar 6, 2023

But one of the crashes "reproduces" the segfault. So there seems to be at least two issues (not suggesting they are related)

  1. segfault with current main suspects: MPI.jl, libraries MPI.jl calls, or Julia
  2. some non-determinism in the build process, main suspect: Turing.jl

@alexandrebouchard
Copy link
Member Author

  • does not seem to be due to :funnelled arg
  • does not seem to be due to a faulty tag ub

@alexandrebouchard
Copy link
Member Author

One hypothesis: pmodels/mpich#6432 (comment)

@@ -105,7 +105,7 @@ mpi_active() =
Comm_size(COMM_WORLD) > 1
end

init_mpi() = Init() #threadlevel = :funneled)
init_mpi() = Init(threadlevel = :funneled)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As troubleshooting, you should try MPI_THREAD_MULTIPLE. The default is thread-single, which is nearly the same as thread-funneled. But if you have race condition into MPI, then you need thread-multiple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! Unfortunately, the segfault still arise with threadlevel = :multiple. On the positive side, I am now able to reproduce the problem locally.

@alexandrebouchard
Copy link
Member Author

Can reproduce the crash locally now with e.g.:

using Pigeons
pigeons(target = Pigeons.TestSwapper(0.5), n_rounds = 12, n_chains = 200, on = ChildProcess(n_local_mpi_processes = 4))

@miguelbiron
Copy link
Collaborator

I also get segfault locally with the example above.

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

@alexandrebouchard
Copy link
Member Author

Created branch https://github.com/Julia-Tempering/Pigeons.jl/tree/investigate-segfault-openmpi-hack to make local tests on OpenMPI. Instructions: check out that branch, then

using MPIPreferences; MPIPreferences.use_jll_binary("OpenMPI_jll")
exit()
julia
...

@alexandrebouchard
Copy link
Member Author

The problem does not arise with Open MPI, confirming this is likely MPICH related. (with open MPI an unrelated warning message pops up but does not cause a crash, it is documented in open-mpi/ompi#7393 and the fix discussed there removes the warning message).

@alexandrebouchard
Copy link
Member Author

alexandrebouchard commented Mar 7, 2023

Next step on this will be to build a local MPICH with debug symbols on (see JuliaParallel/MPI.jl#720 which also provides a nice template for testing several mpi systems in CI).

Can follow the Yggdrasil script to help with that and also to check if the problem also arises with ch4: https://github.com/JuliaPackaging/Yggdrasil/blob/master/M/MPICH/build_tarballs.jl#L74

@alexandrebouchard alexandrebouchard changed the title Keeping only the suspected faulty test (by commenting out rest (!)) Segfault fix Mar 8, 2023
@alexandrebouchard alexandrebouchard marked this pull request as ready for review March 9, 2023 00:39
.github/workflows/CI.yml Outdated Show resolved Hide resolved
src/Pigeons.jl Outdated Show resolved Hide resolved
src/submission/ChildProcess.jl Outdated Show resolved Hide resolved
src/submission/ChildProcess.jl Outdated Show resolved Hide resolved
src/mpi_utils/Entangler.jl Show resolved Hide resolved
src/submission/MPI.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
src/submission/ChildProcess.jl Outdated Show resolved Hide resolved
test/turing.jl Show resolved Hide resolved
miguelbiron and others added 6 commits March 11, 2023 10:54
Before this, we had crashed such as in
18ef655

Here is what was happening in these crash
1. parent process is using a system library
2. parent process' test first call local pigeons
3. that in turns called mpi_active() which internally used MPI.Init()
   to see if Comm_size > 1
4. that had the side effect of changing ENV
5. then, when calling pigeons MPI tests, these ENV variables were
   passed to ChildProcess, causing problems

The new approach avoids to call MPI.Init() frivolously
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants