julia thread hangs with multiple sevals
Hi, I can reproduce a hang (no cpu activity) with the following code on 1.8rc1, 1.8rc3 & 1.9dev. I'm running on Ubuntu 21.10, but I can also reproduce it on Apple M1.
Curiously, but only in 1.8rc1, the hang does not occur if I uncomment the println(42).
I could not reproduce the hang directly on julia.
On my real code I tried merging the 2 seval calls, but it still freezes with 1.
It's very important to enable parallelism setting env vars JULIA_NUM_THREADS=6
from juliacall import Main as jl
jl.seval(
"""
function worker()
for i in 1:typemax(Int64)
a = Float64[]
push!(a, 0.42)
i % 1000 == 0 && println(i)
end
end
"""
)
jl.seval(
"""
begin
#println(42) #this fixes hang only on 1.8rc1
t = Threads.@spawn worker()
println("waiting")
wait(t)
end
"""
)
On my computer it hangs after printing 30200
....
301000
302000
The code runs fine also with a single eval
from juliacall import Main as jl
jl.seval(
"""
begin
function worker()
for i in 1:typemax(Int64)
a = Float64[]
push!(a, 0.42)
i % 1000 == 0 && println(i)
end
end
#println(4) #this fixes hang only on 1.8rc1
t = Threads.@spawn worker()
println("waiting")
wait(t)
end
"""
)
thanks
Maybe related to https://github.com/JuliaLang/julia/pull/45899 ? However, I still get the hang with julia 1.8rc3
I don't have a Linux box, but I'm failing to reproduce your issue in a VM (WSL on Windows and Docker on Mac).
Can you give me precise instructions of how you can reproduce your issue on a fresh box - what exactly do you install (versions of Python and Julia and their packages) and what commands do you run?
Though TBH even if I could reproduce it I'm not sure where I'd start debugging this. It seems like an issue with task/thread scheduling, which I know very little about.
Hi,
I forgot to mention that you'll need to activate julia multithreading with env var JULIA_NUM_THREADS=6. Otherwise, it never hangs
OS:
I can reproduce it both:
- directly on Ubuntu
- on an Ubuntu docker image on Ubuntu. Only with julia and python. I only install juliacall pip.
Julia
I can reproduce it on julia 1.7, 1.8-rc1, 1.8-rc3 and 1.9master
Python
I can reproduce it on python 3.9 & 3.10
Ah possibly (hopefully) related to #201 then. My best guess is that since your loop is allocating, at some point GC is invoked, which triggers the finalizer of some Python object, which deadlocks the GIL lock it acquires.
Support for working in a multithreaded environment should be considered experimental at best right now.
GC is invoked, which triggers the finalizer of some Python object, which deadlocks the GIL lock it acquires.
do you mean python or Julia GC?
some Python object
do you mean internal python objects? The test above does not make any communication betwen julia & python
I mean Julia GC. Your actual code doesn't touch python, but the act of calling jl.seval probably internally creates some temporary python objects which get GC'd at some point.
yes! In my real code, this trick solves the hang
for i in 1:total
GC.enable(false)
Threads.@threads for x in list
loop(x)
end
GC.enable(true)
i % 100 == 0 && GC.gc(false)
end
It's important to periodically call GC to avoid memory exhaustion (see https://github.com/JuliaLang/julia/issues/45068)
OK great.
I can also reproduce the issue. Another work-around is to insert GC.gc() into the top of the second chunk of code. Presumably this tidies up any Python objects left over from the first jl.seval() on the main thread.
Over on the gc branch I have added functions which allow you to temporarily disable the Python garbage collector. The below is a modified version of your code to use this.
from juliacall import Main as jl
jl.seval(
"""
function worker()
for i in 1:10_000_000
a = Float64[]
push!(a, 0.42)
i % 1000 == 0 && println(i)
end
end
"""
)
jl.seval(
"""
begin
PythonCall.C.gc_disable()
t = Threads.@spawn worker()
println("waiting")
wait(t)
PythonCall.C.gc_enable()
end
"""
)
If you want to try it out, check out the branch, copy pysrc/juliacall/juliapkg-dev.json to pysrc/juliacall/juliapkg.json and pip install -e ..
I've just released a version of PythonCall with these functions, except they are now called PythonCall.GC.enable() and PythonCall.GC.disable(). I think this is the best solution to your problem right now. Feel free to open a new issue with any problems.
hi, thanks for looking into this! It looks like I missed your messages from July :-( I did a quick test with 0.9.5 on my project, and unfortunately, it now crashes with and without calling GC.enable/disable. I'll try to find some time this week to run more tests