gh-125789: fix side-effects in `asyncio` callback scheduling methods
This is a proposal for fixing the side-effects that could arise from Py_EQ. Similar to the patch where I fixed OrderedDict.__eq__, I did not modify the pure Python implementation but I can do it (I don't think we want to align both implementations; we just don't want the interpreter to crash and I don't think we can make it crash using the pure Python implementation only).
- Issue: gh-125789
You can still trigger a use after free with these changes since the tuple item never gets incref'd which allows you to abuse __eq__ and NotImplemented within evil classes.
import asyncio
fut = asyncio.Future()
class setup:
def __eq__(self, other):
print("in setup __eq__")
return False
def __del__(self):
print("deleting self")
cb_pad = lambda: ...
fut.add_done_callback(cb_pad) # sets fut->fut_callback0
fut.add_done_callback(setup()) # sets fut->fut_callbacks[0]
# removes callback from fut->fut_callback0 setting it to NULL
fut.remove_done_callback(cb_pad)
# evil MUST be a subclass of setup so that the evil __eq__ gets called first
class evil(setup):
def __eq__(self, value):
fut._callbacks.clear()
print("in evil __eq__")
return NotImplemented
fut.remove_done_callback(evil())
Oups, I noticed that there was already a UAF in the code but I forgot to fix it. Thanks!
Are you missing a file? This seems to only change tests.
Err... the _asynciomodule.c file was modified on my side. Maybe you are looking at a single commit? (or you marked the file as being reviewed?)
One more UAF needs to be patched out, it's possible to corrupt fut_callback0 then do the NotImplemented trick to use callback0 after it has been freed. Just needs an incref before usage.
https://github.com/python/cpython/blob/aaed91cabcedc16c089c4b1c9abb1114659a83d3/Modules/_asynciomodule.c#L1019-L1021
POC
import asyncio
fut = asyncio.Future()
class a:
def __eq__(self, other):
print("in a __eq__", self)
return True
def __del__(self):
print("deleting", self)
class b(a):
def __eq__(self, other):
print("in b __eq__")
fut.remove_done_callback(None)
return NotImplemented
fut.add_done_callback(a())
fut.remove_done_callback(b())
Thanks Nico for your insight and others for the review. I appreciate you taking the time to hunt those UAF so if you want to have a look at other usages of Py_EQ, I'd be happy to review/write the corresponding PRs. I'll address the other comments in a few hours!
:robot: New build scheduled with the buildbot fleet by @picnixz for commit f7b67309e88614d34e8cdafe591bfb6bddc8c661 :robot:
If you want to schedule another build, you need to add the :hammer: test-with-refleak-buildbots label again.
Would it be easier to always return a copy from FutureObj_get_callbacks?
What happens if fut._callbacks is mutated from another thread on 3.13t?
What happens if fut._callbacks is mutated from another thread on 3.13t?
Err... I don't know. I haven't thought about this. I think it should be addressed in a follow-up PR though.
Would it be easier to always return a copy from FutureObj_get_callbacks?
It might be easier but I don't know how it could affect performances. Asyncio experts need to confirm whether this could be a solution or not. For instance, we could return a tuple instead of a list, and this would at least avoid people using .clear() or changing an item, though I'm not sure if we can still do other bad tricks.
Btw, there are some inconsistencies between the Python and the C implementation for that. When we don't have any callback, the C implementation returns None but the Python implementation returns an empty list. Should it be changed?
It might be easier but I don't know how it could affect performances
It would be very rare to read fut._callbacks and it only seems to make sense in tests so performance of that branch is a non-issue. Adding extra checks to add_done_callback and schedule_callbacks is performance sensitive.
I think also that it would be more consistent to always return a copy. Perhaps PyFuture could also return a copy using a @property
I also suspect a copy is the only way to fix this on 3.13t without adding extra locks
It would be very rare to read fut._callbacks and it only seems to make sense in tests so performance of that branch is a non-issue. Adding extra checks to add_done_callback and schedule_callbacks is performance sensitive.
Actually, fut._callbacks is already a copy of the internal list if the callback0 is set. The path that we attack is essentially:
if (fut->fut_callback0 == NULL) {
if (fut->fut_callbacks == NULL) {
Py_RETURN_NONE;
}
return Py_NewRef(fut->fut_callbacks); // here
}
So we could also do something like
PyObject *new_list = PyList_New(PyList_GET_SIZE(fut->fut_callbacks));
if (new_list == NULL) {
return NULL;
}
for (i = 0; i < PyList_GET_SIZE(fut->fut_callbacks); i++) {
PyObject *cb = PyList_GET_ITEM(fut->fut_callbacks, i);
Py_INCREF(cb);
PyList_SET_ITEM(new_list, i, cb);
}
return new_list;
instead and the problem is likely to be gone?
We likely still have the UAF issue on fut->fut_callback0 but I think we wouldn't need to check at each loop iteration for the consistency of fut->fut_callbacks.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.
Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.
@kumaraditya303 While I understand that you're an asyncio maintainer, I would have appreciated that you first asked whether you wanted me to make the modifications here instead of plainly requesting changes and opening another PR. Nevertheless, I'm not sure that returning a copy entirely fixes the second UAF, in which case we can split the work into two (I also don't know which kind of "changes" you'd like to see here).
While I understand that you're an asyncio maintainer, I would have appreciated that you first asked whether you wanted me to make the modifications here instead of plainly requesting changes and opening another PR. Nevertheless, I'm not sure that returning a copy entirely fixes the second UAF, in which case we can split the work into two (I also don't know which kind of "changes" you'd like to see here).
Please try to fix one issue in one PR. As far as I see, this PR has several changes, it change some pure python code, it tries to fix use-after-free issue related to malicious call_soon which is unrelated to issue, it would be better to fix that separately. Also this PR takes a different route to fix the issue by adding several type checks, which bulks out the code so I proposed an alternative to this in a separate PR.
Also since this bug exists in older versions, we should aim for the least invasive change and keep smaller PRs to be easily able to backport them.
I would appreciate if you create an issue of UAF issues related to scheduling and create an PR specifically fixing that, which I can easily backport. Thanks
Thanks. Yet, in the future, I would then appreciate that you mention those points first. For the UAF, I'll just wait that your change has been merged until creating a new one and extract the tests and PoCs.
Thanks. Yet, in the future, I would then appreciate that you mention those points first. For the UAF, I'll just wait that your change has been merged until creating a new one and extract the tests and PoCs.
Okay, I'll take care of that next time, in the mean time you can review my PR
(I'll be back on my dev env tomorrow but I'll review this evening/tomorrow)
Closing this one in favor of:
- https://github.com/python/cpython/pull/125922
- https://github.com/python/cpython/pull/125967
- https://github.com/python/cpython/pull/125970