PythonCall.jl icon indicating copy to clipboard operation
PythonCall.jl copied to clipboard

Pickle doesn't work on julia modules

Open msundvick opened this issue 4 years ago • 7 comments

Bit of a strange request, but there is a use case. A minimal example is:

import pickle
from juliacall import Base
pickle.dumps(Base)
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: cannot pickle 'juliacall.ModuleValue' object

This would be useful for defining functions that are passed around in Python. I initially ran into this when using ray, so a slightly more complex example that has the same behaviour:

import ray
ray.init()
from juliacall import Base
@ray.remote
def v(x):
    Base.println(x)
x = ray.get(v.remote(1))

I had a look at PyCall as well, but they've got a similar issue here https://github.com/JuliaPy/PyCall.jl/issues/863, with a few comments on how to fix it.

Any ideas? I'll try to have a look into this further, but any help getting this working would be appreciated.

msundvick avatar Jul 21 '21 02:07 msundvick

Oh duh, there's a workaround for my particular case. Since ray is spawning processes, and juliacall is inserting the module into the process during the init step, something like this works fine:

import ray
ray.init()
@ray.remote
def v():
    from juliacall import Base
    return Base.seval("1+1")

print(ray.get(v.remote()))

I could see this being a bit painful for other cases though, so I'll leave the issue open for now.

msundvick avatar Jul 21 '21 03:07 msundvick

This package is currently being rewritten from scratch, and as it happens I recently added support for pickle (and Serialization on the Julia end).

I haven't tested it out much, and will probably hang with any cyclic objects, but hopefully it will pickle modules ok.

Try it out by installing PythonCall#master and reinstalling juliacall from there. Let me know how you get on.

cjdoris avatar Jul 21 '21 19:07 cjdoris

PS it's cool you're using this with Ray, what are you planning to do?

cjdoris avatar Jul 21 '21 19:07 cjdoris

Oh nice! This does solve the pickle issue, something like this works just fine

import pickle
from juliacall import Base
a = pickle.loads(pickle.dumps(Base))
print(a.rand())

Unfortunately, the way distributed computing works in ray seems to be interfering with reloading the shared library? On master, my original code for ray segfaults. The workaround of explicity importing the module inside the actor is still fine though.

The whole point of this is we've already got a bunch of code in python for doing a scientific simulation (for racing yacht's, to be precise). So we've already done some code optimization, parallelizing with ray. There's still some bottlenecks though, and it would be nice to switch out some parts for julia, as well as having access to things like DiffEq. But ray is pretty integral at this point, so I need something that will play nice with it.

msundvick avatar Jul 22 '21 00:07 msundvick

Also, new issue really, but where's juliacall_pipdir gone? I'm getting UndefVarError when following the juliacall setup instructions while on master. Manually hunting down the location and runnning pip install is fine though. Is this just planning ahead for a release on PyPI?

msundvick avatar Jul 22 '21 00:07 msundvick

No I just forgot! But it will go in PyPI at some point.

I think some of Julia's code loading happens in threads which plays havoc a bit with python's GIL.

I don't know when I can get round to fixing it, hopefully the workaround you found is ok for now?

cjdoris avatar Jul 22 '21 06:07 cjdoris

Sure, it's not terribly elegant but it works well enough for now. Yeah, I can totally see the GIL being a pain, so no worries if it takes a while to fix.

msundvick avatar Jul 22 '21 22:07 msundvick