devlib Replace nest_asyncio with greenlets

Considering https://github.com/ARM-software/devlib/issues/682, this PR updates devlib to use greenlets in cases where nest_asyncio cannot be used (e.g. using uvloop event loop).

This alternative path works by using greenlet to provide a way for nested coroutines separated from the top-level coroutine by blocking calls to yield their action.

TODO:

[x] Handle exceptions raised in __await__, so that they are propagated to the correct coroutine rather than the top-level one
[x] Write unit test to check everything works on various event loops (e.g. uvloop)
[x] Add some rational in asyn.py to explain the moving pieces

Fixes #682

Apr 26 '24 16:04 douglas-raillard-arm

PR updated with:

implementation directly based on greenlet rather than greenback. This avoids some ctypes-based hacks that are not necessary. That implementation is compatible with any asyncio event loop implementation.
Still use nest_asyncio when possible, as the greenlet version might execute the coroutine on a separate thread in some circumstances, whereas nest_asyncio never needs to. This important as the jupyter lab use case unfortunately falls in the "needing the thread" category, since the all code (including the imports) executes inside a pre-existing coroutine.
A commit that allows recycling connection instances when a thread is done with them, rather than just destroying the connection. This means that a client frequently spawning threads (e.g. creating short lived thread pools) will not reconnect to the target all the time. The coroutine execution in a separate thread can also share the connection. This is implemented by returning the connection to the pool after every Target method that directly uses self.conn (see call_conn decorator). This has a user-visible effect that target.conn might give a different object across calls to Target methods.

May 09 '24 15:05 douglas-raillard-arm

PR updated with exception handling in coroutine and exception injection with .throw(). The state of this PR will stay as [Draft] until I add some unit tests to ensure correct operations (and fix all the items in the TODO list in this PR cover letter) as discussed with Vincent Coubard.

May 13 '24 10:05 douglas-raillard-arm

PR updated with:

unit tests added for devlib.utils.asyn.run(), simulating various event loop setups including non-stdlib event loops (uvloop). A fairly comprehensive set of nested run() calls is checked, as well as exception and value injection (coro.send() and coro.throw())
nest_asyncio properly removed, as it relies on asyncio.get_event_loop() will not create an event loop automatically if one is not setup when called from a non-main thread. asyncio.get_event_loop() is deprecated and cannot be relied upon to create an event loop, so nest_asyncio usage is basically broken. Since we want devlib to be importable in the non-main thread (no reason for it to explode), this is not acceptable and we can just fully switch to our new implementation

May 14 '24 17:05 douglas-raillard-arm

PR updated with extra comments. I now consider it ready so I'll remove the [Draft] status

May 21 '24 09:05 douglas-raillard-arm

Found an issue, please do not merge as-is.

EDIT: the issue is the following: devlib.utils.asyn.asynccontextmanager() provides a wrapper over contextlib.asynccontextmanager() to allow treating the async context manager as a regular blocking context manager. This is achieved by implementing the following:


    def __enter__(self, *args, **kwargs):
        return run(self.cm.__aenter__(*args, **kwargs))

    def __exit__(self, *args, **kwargs):
        return run(self.cm.__aexit__(*args, **kwargs))

This is fine as long as run() simply re-enters an existing event loop. If these run() calls are top-level calls, they will each create a new event loop and try to iterate over the async generator. This is a problem in two ways:

An event loop closes all async generators upon exit. This means the 2nd run() call from __exit__() is ends up seeing a closed async gen, confusing the stdlib implem that then raises RuntimeError("generator didn't stop after athrow()")
We end up "migrating" the async gen from one event loop to another, which is the same as migrating a coroutine across event loops.

Issue 1. can be worked around by hijacking the mechanism the event loop uses to be aware of new async gen.

However, issue 2. is more tricky and is probably a real issue if e.g. the async generator tries to take an asyncio.Lock() across yield calls. The lock future would be handled by the first event loop, then cancelled, and the generator would probably fail to run the 2nd iteration on another event loop.

I'll experiment to see what possibilities exist to fix this problem. This is fortunately the only place that relies on migrating a coroutine between multiple event loops. Trio encountered a similar problem: https://github.com/python-trio/trio/issues/2081

May 22 '24 15:05 douglas-raillard-arm

PR updated with:

extra unit tests that reproduced the issue
_AsyncPolymorphicCM now spins an event loop if necessary in __enter__ and closes it in __exit__, so that both the __aenter__() and __aexit__() coroutines are executed on the same event loop.

May 22 '24 20:05 douglas-raillard-arm

@marcbonnici, Vincent Coubard, Branislav Rankov, this should be ready for the last review/testing round. I consider it to be ready.

May 31 '24 14:05 douglas-raillard-arm

PR rebased. As of today we started dogfooding that PR in LISA's vendored devlib tree, so it should get some more real-world exposure in the coming weeks.

Jun 18 '24 10:06 douglas-raillard-arm

PR updated with extra tests to check the blocking API works when invoked from asyncio.to_thread(). Everything does work.

Jun 18 '24 13:06 douglas-raillard-arm

PR updated with a fix to use run() on the return values of anext() for async generators, with the matching unit test.

Jun 21 '24 12:06 douglas-raillard-arm

PR rebased

Jun 27 '24 09:06 douglas-raillard-arm

Update with a check to only call loop.shutdown_default_executor() if it exists, since it was added in Python 3.9

https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.shutdown_default_executor

Jul 16 '24 13:07 douglas-raillard-arm

An issue was found, I'm currently working on fixing it, please do not merge until it's fixed.

Jul 30 '24 15:07 douglas-raillard-arm

PR updated with:

Some refactoring of private functions involved in run() implementation.
Ensure that we really do use the same loop for both __aenter__ and __aexit__ for async context managers.
Ensure that each asyncio.Task gets its own top-level instrumented coroutine. This is critical as a Task would not be able to yield on behalf of a coroutine in another task. This would deadlock, since the event loop would be polling task B while a nested coroutine inside task B would be trying to yield through task A. Task A would be ignored by the event loop until task B yields, leading to a deadlock.

Jul 31 '24 16:07 douglas-raillard-arm

One thing I realized and might want to change is that if an event loop is already running (e.g. in a jupyterlab notebook), we will dispatch the coroutine on a loop setup in a separate thread. It's all good except we have a single such thread.

This means that code making parallel devlib invocation of devlib in threads with a pre-setup event loop will end up being serialized. It shouldn't be very hard to change, I'll see if I can do that tomorrow

Jul 31 '24 16:07 douglas-raillard-arm

PR updated with:

significant simplification in the way the different cases are handled. There is now a _CoroRunner subclass for each case.
simplification of how async context managers are handled, thanks to _CoroRunner. All we now need is to get a runner that will be usable to execute multiple coroutines on the same event loop, even if that event loop is sitting in a separate thread.
Added contextvars test and fixed the code. contextvars are now propagated down through devlib.utils.asyn.run() and any update done inside the coroutine will be propagated back in the caller of run(), so it is completely transparent.
The threaded coroutine runner used as last resort when an event loop is already running in the current thread can now use an arbitrary number of threads. This prevents serialization of otherwise parallel code and ensures no deadlock can occur.

Considered it is a substantial change, I'd be more comfortable dogfooding it in LISA for a little while before we consider merging it.

Aug 01 '24 17:08 douglas-raillard-arm

1.5 month later there has been no reported issue, so I think it's good to go

Sep 12 '24 15:09 douglas-raillard-arm