Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI implementations intercepting Signals is incompatible with Julia GC safepoint #725

Open
alexandrebouchard opened this issue Mar 10, 2023 · 15 comments · May be fixed by #742
Open

MPI implementations intercepting Signals is incompatible with Julia GC safepoint #725

alexandrebouchard opened this issue Mar 10, 2023 · 15 comments · May be fixed by #742

Comments

@alexandrebouchard
Copy link
Contributor

Thanks again for your help with #720 - this one is unrelated (except that issue #720 lead us to create more comprehensive unit test revealing this new, probably unrelated segfault).

Summary of this problem: a segfault occurs when GC is triggered in a multithreaded+MPI context.

How to reproduce: I have create a draft PR adding a GC.gc() call in one of MPI.jl's existing multithreaded test: see PR Request #724

The draft PR is based off the most recent commit where all tests passed (Tag 0.20.8). In the output of "test-intel-linux", the salient output is

signal (11): Segmentation fault
in expression starting at /home/runner/work/MPI.jl/MPI.jl/test/test_threads.jl:18
ijl_gc_enable at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gc.c:2955

The change we made is in the file test/test_threads.jl, where we added the following if clause:

    Threads.@threads for i = 1:N
        reqs[N+i] = MPI.Irecv!(@view(recv_arr[i:i]), comm; source=src, tag=i)
        reqs[i] = MPI.Isend(@view(send_arr[i:i]), comm; dest=dst, tag=i)
        if i == 1 
            GC.gc()
        end

    end 

We experience similar problems with MPICH 4.0 in our package (https://github.com/Julia-Tempering/Pigeons.jl), but not with MPICH 4.1.

Related discussions

This describes a similar issue in the context of UCX. However this problem does not seem limited to UCX from our investigations so far.

This describes a similar issue in the context of OpenMPI. However it seems that certain versions of MPICH and intel MPI (which is MPICH-derived) might suffer from a similar issue?

In light of these two sources, perhaps other environment variables in the style of

ENV["UCX_ERROR_SIGNALS"] = "SIGILL,SIGBUS,SIGFPE"
could be set to address this issue? I was wondering if anyone might have some suggestion on whether that's a reasonable hypothesis? Having limited MPI experience I am not sure what these environment variables might be.

Thank you so much for your time.

@vchuravy
Copy link
Member

In a multi-threaded environment Julia uses segmentation faults on special addresses for it's safepoint implementation. If the MPI implementation intercepts signals this will cause spurious aborts.

UCX is a library that does this and so for a better experience we tell it not to. Generally Julia will handle signals for the user.

@alexandrebouchard
Copy link
Contributor Author

alexandrebouchard commented Mar 11, 2023

That's right, @vchuravy and this issue we are documenting here is that this issue is not just with UCX, and affects other MPI implementations, in particular some that are currently in MPI.jl's set of test cases (see "test-intel-linux" in #724 showing that MPI.jl with Intel's MPI will currently crash when GC happens in a multithreaded context)

@vchuravy vchuravy changed the title GC in a multithreaded MPI context causing segfaults beyond UCX MPI implementations intercepting Signals is incompatible with Julia GC safepiint Mar 11, 2023
@vchuravy vchuravy changed the title MPI implementations intercepting Signals is incompatible with Julia GC safepiint MPI implementations intercepting Signals is incompatible with Julia GC safepoint Mar 11, 2023
@vchuravy
Copy link
Member

If you can figure out how to tell Intel MPI not to intercept signals we can add that as a vendor specific workaround.

@alexandrebouchard
Copy link
Contributor Author

We will do some research on that, thank you.

However it seems though a more principled approach would be to tell Julia to use another signal for GC coordination, since it seems that in any situation where Julia is used as a child process, GC+multithreading would trigger a crash. This leads to a kind of a Whac-A-Mole situation where the issue has to be addressed on all possible of parent processes, some of which could potentially be closed source (like the situation here).

@alexandrebouchard
Copy link
Contributor Author

Also it looks like that issue was reported here: https://discourse.julialang.org/t/julia-crashes-inside-threads-with-mpi/52400/5

From a quick look there is no obvious ENV-based workaround for Intel MPI.

Add to the list of MPI systems incompatible with GC+multithread: MPICH 4.0 (but MPICH 4.1!).

@vchuravy
Copy link
Member

However it seems though a more principled approach would be to tell Julia to use another signal for GC coordination, since it seems that in any situation where Julia is used as a child process, GC+multithreading would trigger a crash

Let's be precise here. Julia does not crash, the MPI implementation is misreporting a signal as a crash.

The Julia GC safepoint needs to be very low-overhead and is implemented as a load from an address. When GC needs to be triggered Julia set's the safepoint to hot e.g. it maps the page from which the load happens as inaccessible. The OS will provide a signal to the process and Julia inspects the address to ensure that the signal was caused by the safepoint.

While there are different alternatives one could implement, this method has the lowest overhead during execution off the program,
(and while I am interested in experimenting with different alternatives I don't expect these experiments to bear fruit any time soon).

some of which could potentially be closed source

I would encourage you to file a ticket with the vendor of the software.

@vchuravy
Copy link
Member

Can you see which libfabric version the IntelMPI is using? There was a signal handler related bugfix that landed in v1.10.0rc1 (ofiwg/libfabric#5613)

@alexandrebouchard
Copy link
Contributor Author

According to

key: ${{ runner.os }}-intelmpi-2019.9.304

this particular failed test is on intelmpi-2019.9.304

@vchuravy
Copy link
Member

@simonbyrne the latest is 2021.8.0 maybe worth an update?

@simonbyrne
Copy link
Member

Is that the same as oneAPI MPI? We already test that (thanks to @giordano)

@simonbyrne
Copy link
Member

@alexandrebouchard what version of Intel MPI are you using? And what is your libfabric version?

@alexandrebouchard
Copy link
Contributor Author

I am travelling this week, but let me get back to you on this soon!

@vtjnash
Copy link

vtjnash commented Jun 23, 2023

Intel PSM also has the same issue as this, and requires the existence of the environment variable IPATH_NO_BACKTRACE not to crash, this is undocumented here:

https://github.com/intel/psm/blob/e5b9f1cbf432161639cb5c51d17b196c92eb4278/ipath/ipath_debug.c#L162

Similar to UCX as documented here:
https://juliaparallel.org/MPI.jl/stable/knownissues/#Multi-threading-and-signal-handling

@giordano
Copy link
Member

Also OpenMPI sets the same environment variable for a similar reason: https://docs.open-mpi.org/en/main/news/news-v2.x.html

Change the behavior for handling certain signals when using PSM and PSM2 libraries. Previously, the PSM and PSM2 libraries would trap certain signals in order to generate tracebacks. The mechanism was found to cause issues with Open MPI’s own error reporting mechanism. If not already set, Open MPI now sets the IPATH_NO_BACKTRACE and HFI_NO_BACKTRACE environment variables to disable PSM/PSM2’s handling these signals.

https://github.com/open-mpi/ompi/blob/4216f3fc13079b80f64c07987935345189206064/opal/runtime/opal_init.c#L98-L115

    /* Very early in the init sequence -- before *ANY* MCA components
       are opened -- we need to disable some behavior from the PSM and
       PSM2 libraries (by default): at least some old versions of
       these libraries hijack signal handlers during their library
       constructors and then do not un-hijack them when the libraries
       are unloaded.

       It is a bit of an abstraction break that we have to put
       vendor/transport-specific code in the OPAL core, but we're
       out of options, unfortunately.

       NOTE: We only disable this behavior if the corresponding
       environment variables are not already set (i.e., if the
       user/environment has indicated a preference for this behavior,
       we won't override it). */

@giordano
Copy link
Member

It doesn't look like setting IPATH_NO_BACKTRACE=1 is sufficient: #742 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants