-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmentation fault with multi-threading #219
Comments
This is probably the same as #201 and #202. The Python interpreter must only be used on thread 1. If you use Julia's multithreading, then even if it doesn't explicitly use Python, at some point the GC may free a Python object. If this GC runs on a thread other than 1, Python will crash. The solution is to guard multithreaded Julia code with calls to function test(x)
partial = zeros(Threads.nthreads())
PythonCall.GC.disable()
try
Threads.@threads for i in 1:Threads.nthreads()
for j in i:Threads.nthreads():length(x)
partial[i] += x[j]
end
end
finally
PythonCall.GC.enable()
end
return sum(partial)
end This ensures that no Python objects are freed during the multithreaded portion: if Julia's GC collects any Python objects, they will be cached and freed at |
Not sure if I did something wrong, but I still get the segfaults: In [1]: from juliacall import Main as jl
In [2]: jl.seval("""
...: function test(x)
...: partial = zeros(Threads.nthreads())
...: PythonCall.GC.disable()
...: try
...: Threads.@threads for i in 1:Threads.nthreads()
...: for j in i:Threads.nthreads():length(x)
...: partial[i] += x[j]
...: end
...: end
...: finally
...: PythonCall.GC.enable()
...: end
...: return sum(partial)
...: end
...: """)
Out[2]: test (generic function with 1 method)
In [3]: import numpy as np
In [4]: x = np.random.random((10_000,));
In [6]: %timeit jl.test(x)
60.4 µs ± 30.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit jl.test(x)
Segmentation fault (core dumped) (the same for my package) |
In that case I don't know what's going on. I can reproduce the error but don't get a core dump. Can you poke around the core dump with gdb and see what's going on? Maybe get a backtrace? |
If I do The PythonCall GC code is itself not thread safe, but I assumed that that wasn't an issue since Julia's GC can only run from one thread at a time, so I assumed that finalizers only ran in one thread at a time. Maybe that's not the case. I think with a little more digging I can figure this out. |
I'm flummoxed! Here's a slightly smaller MWE: In [1]: import os
In [2]: os.environ['PYTHON_JULIACALL_THREADS'] = '4'
In [3]: from juliacall import Main as jl
In [4]: jl.seval("""
...: function test()
...: partial = zeros(Threads.nthreads())
...: x = zeros(10_000)
...: Threads.@threads for i in 1:Threads.nthreads()
...: for j in i:Threads.nthreads():length(x)
...: partial[i] += x[j]
...: end
...: end
...: return sum(partial)
...: end
...: """)
Out[4]: test (generic function with 1 method)
In [5]: %timeit jl.test()
The slowest run took 10.87 times longer than the fastest. This could mean that an intermediate result is being cached.
150 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %timeit jl.test()
Segmentation fault Things tried:
So it looks like some combination of garbage collection and threading are the issue. But I've turned off the finalizers in PythonCall so I don't know what else could be the issue. |
If I am using this approach to disable/enable GC I am running out of memory at some point. Would it help if I make a minimal example that reproduces this behaviour? |
It's not surprising that disabling GC entirely means you run out of memory. You could maybe do a |
Hmm, but I am enabling GC again after the threaded call. I cant find a |
As said on discourse: https://discourse.julialang.org/t/floop-threading-and-juliacall-produce-segmentation-fault/100717/11?u=vchuravy Julia intentionally causes segmentation faults as part of the GC safepoint mechanism. This MPI.jl issue is somewhat related PythonCall starts Julia with Which then causes benign segmentation faults to become spurios ones. |
To be clear, you're suggesting to change that option to "yes" (or just remove it)? According to git blame, it was introduced last year in bef6afa without much explanation (just "better compatibility"), it's unclear what was the motivation for that change. |
Yes, changing it from no to yes will at least no longer cause faults due to GC (and it looks like most of the recorded issues here worked around it by running with GC off. This issue was opened shortly after that change aswell. |
I see that the offending commit is part of the branch https://github.com/cjdoris/PythonCall.jl/commits/gc referenced in issues #201 and #202 which was meant to deal with other segmentation faults in the GC in multithreading programs, but it sounds like this specific change just worsened the situation. Is that an accurate reconstruction of the events @cjdoris? |
Hangs could come about from one thread wanting to stop the world and another executing Python code? The code would need to manually transition to GC safe before calling into Python. |
@vchuravy thanks for the tip here! Just wanted to post that this works perfectly for me, after setting This is significantly better than the workaround i posted here, as we don't have to worry about huge memory allocation issues in the threaded code or temporarily calling |
Thank you for looking into this. I had no idea there was such a thing as a benign segfault! Given it is so important for Julia's GC I guess we should turn Julia's signal handling back on. I'm not sure what that terrible "better compatibility" comment was about. I suspect that it's because Python also wants to handle signals, and letting them both do it interferes with each other. For example, I think allowing Julia's signal handler prevented you from doing keyboard interrupts (Ctrl-C) in Python code. |
@cjdoris in my experience, allowing Julia's signal handler allows keyboard interrupts (Ctrl-C) to work in Python code. Using branch from #333 :
|
This issue has been marked as stale because it has been open for 30 days with no activity. If the issue is still relevant then please leave a comment, or else it will be closed in 7 days. |
I think it is still relevant. The workaround of disabling GC in the internal Julia code can cause memory build up, making threading unusable in some cases. I don't know if this can be solved by modifications of this package alone, that's another question. |
So there are a couple of interactions here:
If you want the python signal handlers to work as well then someone would need to implement signal chaining which might be a bit fiddly but shouldn't be to complicated. I briefly chatted with @vtjnash about this a while ago. |
I think this was largely fixed (well, worked-around) by setting the env var @vchuravy signal chaining would be cool - presumably this requires some level of co-operation by either the core Julia or Python implementation? |
@cjdoris could Alternatively is there any way one could guarantee that all objects passed to a Julia call are copied and no longer accessible to Python? Sort of like how Rust guarantees thread safety. I think allowing mutation by default is going to be dangerous for my application so would like a way to force it to copy objects when passed one way or the other. (I'm not even sure @vchuravy what is the ETA of signal chaining and who should I poke to help get this working? |
I don't know that anyone is working on signal chaining. |
The reason why However what would probably be a good idea is to raise a warning if juliacall is being started with multiple threads and signal handling turned off, suggesting the user may want to enable julia signal handling. |
Maybe my solution in #201 (comment) could help you? |
As far as I can tell, CPython does not rely on any signal handling by default, except for Alternatively, as a first measure, perhaps we could add a flag to Julia to disable just the SIGINT handler, or maybe make Python re-register its SIGINT handler after juliacall is loaded when running from the Python REPL? |
I have this MWE, where I get segmentation faults (frequently, but not deterministically), when trying to run some script that uses multi-threading on the Julia side.
I have used before launching
ipython3
:export JULIA_NUM_THREADS=4
(my computer has 4 cores - 8 threads).
The MWE is:
Here I have emulated the error using the
%timeit
macro fromipython
, but my actual error I get after some runs of a function of my package:%timeit
runs the function multiple times, there seems to be some memory corruption, or memory overflow, causing the error.Anyway, even if you have only some hint on how to debug this, I will be very thankful.
(even in the simplest example above, the segfaults only occur with multi-threading).
The text was updated successfully, but these errors were encountered: