pyo3 Support sub-interpreters

Does pyo3 allow the use case of starting multiple separate interpreters? This would be similar to Python's multiprocessing.

Aug 14 '19 10:08 acminor

It should be possible with the upcoming improvements in 3.8 and above.

Oct 12 '19 18:10 thedrow

Specifically would be amazing if it was possible to create multiple separate python interpreters that could be run on different threads in parallel, but which share the same memory space (with the type system used to ensure this is only observable from rust).

Feb 28 '20 01:02 reem

The complexity with sub-interpreters is that no Python objects should ever be shared between them. The pyO3 API isn't set up to prevent this at the moment, so it's going to take a fair whack of experimentation before we get anything stable in place.

Feb 28 '20 08:02 davidhewitt

I'd like to have multiple thread, each one has an interpreter. No PyObject would be send between thread. IIRC, we couldn't hold a PyObject in rust, everything is ref to UnsafeCell and by default not send.

May 15 '23 03:05 clouds56

@davidhewitt with https://peps.python.org/pep-0684/, using sub interpreters or multiple interpreters to unlock true multi-core parallelism becomes possible.

Is adding support for this in pyo3 timelines or consideration?

May 27 '23 02:05 mohitreddy1996

@davidhewitt with https://peps.python.org/pep-0684/, using sub interpreters or multiple interpreters to unlock true multi-core parallelism becomes possible.

Is adding support for this in pyo3 timelines or consideration?

Found this interesting article on current usage of sub interpreters in python (no rust there)

May 30 '23 12:05 AnatolyBuga

We are very aware of the per-intepreter parallelism landing in Python 3.12. There are significant changes which need to happen to PyO3's current implementation to support this correctly. We have been discussing some of these challenges in multiple discussions across this repo, such as #2885 which looks at the possible nogil option.

There are several main issues which are prominent in my mind, although others may exist:

I understand interpreters cannot share Python objects. This implies that Py<T> needs to be removed or reworked significantly, maybe by removing Send and Sync from that type, probably also by somehow making the operation to attach Py<T> to a Python thread to be unsafe or runtime-checked in some way.
We need to fully transition PyO3 to PEP 630 compatibility, which requires elimination of all static data which contains Python state. This is probably linked to the first bullet.
- APIs like GILOnceCell and GILProtected can no longer be Sync if multiple GILs exist. Transition to PEP 630 compatibility will probably force us to replace these types with alternative solutions.

Solving these problems is likely to create significant churn of PyO3's API, so we can only make progress once someone has proposed a relatively complete solution which we can adopt with a suitable migration path for users.

Jun 02 '23 21:06 davidhewitt

Hello guys, I was redirected here by @messense. I'm building a lib here, and I need to use multiple py compilers at once, so I tried to do it, at first I faced the same issue that py objects can't be shared by threads, so I tried a different approach I created a gill pool inside the thread, then I get a python and did some stuff with it processing some data. It has worked for some time until I start to notice that some threads were randomly crashing, with some Forensic analysis of what was going on I find that the problem is because when we are assuming gill as acquired it basically takes the gill of some thread that is using it, and then the "reference" was gone, this is what I made:

    let getting_py = unsafe { Python::assume_gil_acquired() };
    let gil_pool = unsafe { getting_py.clone().new_pool() };
    py = gil_pool.python();

However, when it happened I switch my lib to call callbacks using only one python compiler at once for now, what isn't optimized, but I have to make the project keeps going, however I continue to try to find a solution for this because it is something that I really need to speed up things here. So someone can please explain better to me or reference me to some trusted article to I understand better how the gill aquire and python compiler inside rust works and if it needs to keep acquired as a reference, because I can potentially have a solution in mind for this that temporarily will work if what I'm with mind really makes sense

Sep 01 '23 02:09 letalboy

Ok, I have cloned the repo here, and study how it works, to start to understand how the logic is going, however I'm not 100% sure of how all this is working because it is too much code to be honest haha. Here I look something interesting:

In GIL.rs have this python interpreter pool that seam to acquire a GIL pool and then release it:

#[cfg(not(PyPy))]
pub unsafe fn with_embedded_python_interpreter<F, R>(f: F) -> R
where
    F: for<'p> FnOnce(Python<'p>) -> R,
{
    assert_eq!(
        ffi::Py_IsInitialized(),
        0,
        "called `with_embedded_python_interpreter` but a Python interpreter is already running."
    );

    ffi::Py_InitializeEx(0);

    // Safety: the GIL is already held because of the Py_IntializeEx call.
    let pool = GILPool::new();

    // Import the threading module - this ensures that it will associate this thread as the "main"
    // thread, which is important to avoid an `AssertionError` at finalization.
    pool.python().import("threading").unwrap();

    // Execute the closure.
    let result = f(pool.python());

    // Drop the pool before finalizing.
    drop(pool);

    // Finalize the Python interpreter.
    ffi::Py_Finalize();

    result
}

The idea here is a pool, and we be able to acquire python from here and then release. Like a pool of connections of a sqlite3, wright?

So the issue seams to be that when we acquire the GIL pool it creates a conn with the interpreter and I'm assuming that at the moment we only can have one of those...

I've been considering a new approach to tackle our challenges with multithreading and Python's Global Interpreter Lock (GIL) for the moment until we can have multiple sub interpreters. My idea is to create a centralized execution pool dedicated to handling Python-related tasks. This would eliminate the need for using arc and mutex to share PyObjects, avoiding the issues we've faced with sending certain objects. We could develop a procedural macro to wrap the Python-invoking code. This macro would package the code, forward it to the centralized pool using a Box, process it, and return the result. Centralizing the pool means we can manage the GIL more efficiently, reducing errors from multiple threads trying to access it simultaneously. While there's a potential bottleneck with a single interpreter, it offers the advantage of invoking Python from different places without GIL acquisition challenges. The primary shift here is that we send the code for execution instead of transferring PyObjects, ensuring the GIL is safely managed. This approach would essentially streamline our execution into a rapid, queue-based system. I'd be eager to hear your feedback on this idea and if it can potentialy work!

Sep 06 '23 03:09 letalboy

The "pool" you refer to above is not a pool of GIL acquisitions, but rather a pool of objects. (There can only ever be one GIL acquisition at a time per interpreter. As per the above in this thread, PyO3 is some way off supporting sub-interpreters.)

If I read your idea correctly it seems like you're proposing having one thread which is running Python workloads and you send jobs to it from other threads. That seems like a completely reasonable system architecture.

Sep 06 '23 21:09 davidhewitt

Yeah, I'm kind lost in the PyO3 files, trying to understand how it works to see if I can help with this, but the idea is exactly that, also while I'm understanding how PyO3 works I'm making a functional model of the idea that I propose, when I have any progress I tell you guys :)

I think this will work to facilitate working with multithreading while we can't have multiple py interpreters, off course this will not be the fastest thing in the world because of this fact that we only can use one interpreter and not spawn sub interpreters to distribute the work load, but will work, specially for the cases when we only need py for small things inside a parallelized data processing mechanism, I think in these cases it will help a lot

Sep 07 '23 23:09 letalboy

Hey everyone! 🚀

I've crafted a hands-on demonstration of a system architecture that seamlessly integrates Python functionalities within parallelized Rust code. This approach effectively sidesteps the GIL constraints and the challenges of passing Python objects between threads.

🔗 Dive into the details and check out the comprehensive documentation here: RustPyNet on GitHub.

While it's not a full-fledged multi-compiler system, it does simplify the execution of Python functions in a multi-threaded environment. For me, it's been a game-changer for projects that leverage parallelized Rust processes and use PyO3 just for callbacks. I genuinely believe this isn't just beneficial for my projects, but for many others in our community, who are working on similar projects, could greatly benefit from this integration.

I'm reaching out to see if there's potential to integrate this into the PyO3 project. I'm genuinely curious about your thoughts, especially from our development team members. If there's interest, I'm more than willing to assist in its implementation. Let's discuss and explore its wider potential! 🤔👨‍💻👩‍💻

Sep 11 '23 01:09 letalboy

We are very aware of the per-intepreter parallelism landing in Python 3.12. There are significant changes which need to happen to PyO3's current implementation to support this correctly. We have been discussing some of these challenges in multiple discussions across this repo, such as #2885 which looks at the possible nogil option.

There are several main issues which are prominent in my mind, although others may exist:

I understand interpreters cannot share Python objects. This implies that Py<T> needs to be removed or reworked significantly, maybe by removing Send and Sync from that type, probably also by somehow making the operation to attach Py<T> to a Python thread to be unsafe or runtime-checked in some way.

We need to fully transition PyO3 to PEP 630 compatibility, which requires elimination of all static data which contains Python state. This is probably linked to the first bullet.

APIs like GILOnceCell and GILProtected can no longer be Sync if multiple GILs exist. Transition to PEP 630 compatibility will probably force us to replace these types with alternative solutions.

Solving these problems is likely to create significant churn of PyO3's API, so we can only make progress once someone has proposed a relatively complete solution which we can adopt with a suitable migration path for users.

To get this issue back on topic, I'd be willing to contribute a decent amount in order to allow PyO3 to support sub-interpreters.

We've noticed that some of our users can't use Cepn's Dashboard, which led me down quite a rabbit hole. To keep things short, I eventually stumbled across bazaah/aur-ceph#20, which lists all of the facts. In short, anything that transitively depends on PyO3 will break once sub-interpreters enter the stage, unfortunately.

So... how may I help? What would be the best way to start tackling this?

Sep 11 '23 11:09 Aequitosh

I’ve tried playing with this a bit. My first idea was to make the 'py lifetime invariant, so it may serve as a unique identifier of object’s provenance. Unfortunately, this breaks basically everything. I’m not sure whether there is some more lenient approach (maybe two lifetimes? token and the actual covariant lifetime). Either way, it seems like it would be a breaking change with this approach.

Sep 11 '23 14:09 GoldsteinE

@Aequitosh thanks for the offer, it would be great to begin making progress on this. The above comment https://github.com/PyO3/pyo3/issues/576#issuecomment-1574360683 is still a good summary of the state of play.

Are you interested in design work? Implementation? Reviews? How much effort are you prepared to put in? This is going to be a big chunk of work.

I think that a realistic mid-term solution is that:

We get PyO3's internal implementation sound under subinterpreters. This means:
- Rework synchronization primitives to not rely on the GIL. The thread #2885 has a lot of discussion in this area. Ideally we need to come up with a transition plan so that existing users can migrate their code without enormous amounts of work.
- Remove static data from PyO3's implementation. The main use of this is in LazyTypeObject which stores the #[pyclass] types. Several possible places to relocate static data to:
  - (Preferred) module-specific state, see PyModule_GetState
  - (Alternative) interpreter-specific state, see PyInterpreterState_GetDict
We give extension module authors the responsibility to audit their own code and have an unsafe opt-into allow their module to be used with subinterpreters, e.g. #[pymodule(unsafe_allow_subinterpreters)]. This would basically be their way of saying "we don't store Py<T> in any static data" - we'd document all the conditions their module should satisfy.

In the long term we may be able to remove the need for extension module authors to audit their own code, once we've built up confidence of operation under subinterpreters.

In short, anything that transitively depends on PyO3 will break once sub-interpreters enter the stage, unfortunately.

I disagree slightly with the sentiment of "will break". Many extension modules implemented in C and C++ most likely also do not work correctly with subinterpreters. I read a comment from CPython devs somewhere which suggested they are aware that even if CPython 3.12 or 3.13 ships with complete subinterpreter support the ecosystem is going to need many years to transition.

Regardless I support that we should do what we can to not block users who are pushing to run subinterpreters in their systems. All help with implementation is gladly welcome. I would also be open to considering an environment variable PYO3_UNSAFE_ALLOW_SUBINTERPRETERS=1 which gives end users the opportunity to disable the subinterpreter safety check... at their own responsibility of crashes. Such an opt-out may strike an acceptable balance between Rust's penchant for correctness and Python's mentality that we are all responsible users.

Sep 11 '23 14:09 davidhewitt

@GoldsteinE that's an interesting idea. Care to explain a little more about the original thesis behind making the lifetime invariant?

(We might also want to split this topic into several sub-issues / discussions with back references to here...)

Sep 11 '23 14:09 davidhewitt

@davidhewitt The idea is taken from the GhostCell paper. Basically, the signature of Python::with_gil() has F: for<'py> FnOnce(Python<'py>) -> R in it. If the 'py lifetime is invariant, then the following code

interpreter1.with_gil(|py1| {
    interpreter2.with_gil(|py2| {
        let obj1 = py1.get_some_python_ref(); // has lifetime 'py1
        let obj2 = py2.get_some_python_ref(); // has lifetime 'py2
        obj1.some_method(obj2); // error: lifetimes do not match
    })
})

wouldn’t compile, preventing us from mixing objects from different interpreters (Py<_> pointer would need a runtime tag, since it doesn’t have a lifetime).

My dyngo crate is a practical example of this technique.

Sep 11 '23 14:09 GoldsteinE

Interesting. I can see how that would guarantee provenance statically, but I think it might cause issues with APIs like #[pyfunction] where the exact same code region might be called from multiple different interpreters. My instinct was that we would have to store the interpreter ID inside each Py<T> and only allow attaching to the same interpreter.

Having the Python lifetime be invariant may be a good idea to consider as part of #3382.

Sep 11 '23 14:09 davidhewitt

Yes, Py<T> would need to have a runtime tag. I think #[pyfunction] is probably okay, since it would be generic over 'py, which is invariant?

Sep 11 '23 14:09 GoldsteinE

Would an invariant lifetime also preclude valid code like

interpreter1.with_gil(|py1| {
    interpreter1.with_gil(|py2| {
        let obj1 = py1.get_some_python_ref(); // has lifetime 'py1
        let obj2 = py2.get_some_python_ref(); // has lifetime 'py2
        obj1.some_method(obj2); // still an error: same interpreters, but the lifetimes are generative and hence unique per closure invocation
    })
})

?

Personally, I think we will need to store interpreter ID within all references into the Python heap. I also think, this will mesh well with our aim to drop bare references (and the pool required to make them work) for other reasons.

Sep 11 '23 14:09 adamreichold

Regardless I support that we should do what we can to not block users who are pushing to run subinterpreters in their systems. All help with implementation is gladly welcome. I would also be open to considering an environment variable PYO3_UNSAFE_ALLOW_SUBINTERPRETERS=1 which gives end users the opportunity to disable the subinterpreter safety check... at their own responsibility of crashes. Such an opt-out may strike an acceptable balance between Rust's penchant for correctness and Python's mentality that we are all responsible users.

For the same reasons as in the discussion of nogil support and whether PanicException should derive BaseException or not, I am somewhat sceptical about adding user-controlled I-dont-care-just-make-it-go-fast-and-you-can-bet-I-am-also-enabling-ffast-math-everywhere kind of flags. I would prefer that this requires opt-in from the extension authors. Really ~~stubborn~~ adventurous users can always modify the code to perform that opt-in themselves.

Sep 11 '23 14:09 adamreichold

@adamreichold Yes, I assumed that ::with_gil() is not reentrant, but it apparently is. I’m not sure what’s the usecase of it though: if you already have obj1, you could just write

let py = obj1.py();
let obj2 = py.get_some_python_ref();
obj1.method(obj2);

Is there a case where this workaround is too unwieldy to use?

Sep 11 '23 14:09 GoldsteinE

Is there a case where this workaround is too unwieldy to use?

Yes, with_gil is often used when the GIL token cannot be threaded through the call chain, e.g. when implementing standard traits like fmt::Display on user types whose implementation still needs access to GIL-protected data.

We actually do the work to detect the case when the GIL is already held without calling into the CPython API to make this rather common case as fast as reasonably possible.

Sep 11 '23 14:09 adamreichold

.with_gil() could still be reentrant, even if objects from different invocations don’t mix. fmt::Display doesn’t accept anything GIL-bound, so it should still work fine, I think

Sep 11 '23 14:09 GoldsteinE

In general, I feel like there’re two cases:

Either you already have some GIL-bound object, in which case you could just get its Python
Or you don’t, so you’re fine with creating a new scope, since you don’t need to mix your GIL-bound objects with any others

Sep 11 '23 14:09 GoldsteinE

fmt::Display doesn’t accept anything GIL-bound, so it should still work fine, I think

I don't understand this part: Why can't these traits not be implemented for GIL-bound types, whether they are bare references or smart pointers?

In general, I feel like there’re two cases:

To me this feels like Go's Context parameter: It is simple (for some definition of simple) and works if you control all the code. But I fear that the ergonomics of forcing people to thread through GIL tokens everywhere the want to match objects are really bad, especially across an ecosystem of PyO3-based libraries.

There is also the additional problem, that interpreter identity might not be known until runtime and I might want to have different behaviour (for example optimizations avoiding communication) if they do match. Of course, I could have two different approaches using either two scopes or just a single one, but it forces the author to handle this at a high level of code organization instead of tucking it away as some implementation detail.

Sep 11 '23 15:09 adamreichold

I don't understand this part: Why can't these traits not be implemented for GIL-bound types, whether they are bare references or smart pointers?

I use the word “GIL-bound” as “has a 'py lifetime” here. If you implement fmt::Display for a GIL-bound type, you are already holding GIL, so you could store an interpreter inside the type (like PyAny::py() does). You don’t need to thread the tokens, if you could get it from an existing object.

Sep 11 '23 15:09 GoldsteinE

I agree that this approach may harm convenience for some usecases. The only alternative I see is to always perform this check at runtime, which is more convenient to write, but doesn’t catch some errors that could be detected at compile-time. Maybe there’s some hybrid approach?

Sep 11 '23 15:09 GoldsteinE

I use the word “GIL-bound” as “has a 'py lifetime” here. If you implement fmt::Display for a GIL-bound type, you are already holding GIL, so you could store an interpreter inside the type (like PyAny::py() does). You don’t need to thread the tokens, if you could get it from an existing object.

Ok, so they do work on GIL-bound types, your point is rather that access to GIL-bound types implies access to a GIL token.

Let's try to construct a more involved example. You want to implement PartialOrd<&'py Foo> for Py<Foo>. To do that, you need to to turn Py<Foo> into &'py Foo with the same lifetime, but you cannot verify the interpreter ID of the right-hand side &'py Foo because the information is erased at runtime. So while you do have access to a GIL token, you not actually need access to the Interpreter which produced the reference to produce another one with guaranteed compatible provenance. (Or alternatively, a non-zero sized GIL token containing the interpreter ID).

When bare references are replaced by by GIL-bound smart pointers like Py<'py, Foo> storing the interpreter ID, they themselves would be sufficient to turn PyDetached<Foo> into Py<'py, Foo> or something like that.

Sep 11 '23 15:09 adamreichold

Maybe there’s some hybrid approach?

I think the most straight-forward approach would be to do everything at runtime by default and then provide additional API which lift some of those checks into compile time as an optimization. (Similarly to how the ghostcell types are an relatively inconvenient compile time optimization of std's plain Cell.) (So, we'd need three smart pointers, e.g. PyDetached<T>, Py<'py, T> and PyBranded<'py, 'interp, T>.)

Sep 11 '23 15:09 adamreichold