Make Monai (Random) Transforms work in a multi-threaded environment
With PEP 703 being accepted and implementation of it underway https://github.com/python/cpython/issues/108219 I think it would be great to prepare the Monai transforms early for execution on multiple threads. Especially with the large tensor sizes and the advantages of running transforms on GPU, I'm hoping for free-threading to have a big benefit in (3D) medical imaging AI.
I think the biggest issue here would be the Randomizable transforms, which currently can't be executed inside a threaded environment.
AFAIK, the free-threading is pushed by Meta, and I'm therefore expecting pytorch to be compatible with it relatively early, since one of the talks on no-gil python talked about testing it with a pytorch DL project internally at Meta.
This might be a great opportunity to work on https://github.com/Project-MONAI/MONAI/issues/6854 as well, which might eventually require breaking changes in the Randomizable API as well.
What is the reason for storing the randomization information for a sample in an instance variable on the transform? Compared to having the randomize() method return that information as an object?
With Pytorch 2.6 free-threading support (Python 3.13t) is enabled on Linux x64, and Numpy has preliminary support since version 2.1, so the AFAIK main dependencies for Monai could now be used in free-threading python.
Is there any roadmap/timeline for supporting free-threaded python within MONAI already? I understand there are a lot of other things you are working on, I'm just interested in what the thoughts of the MONAI core team are on this.
I see two ways to support this, and I would especially appreciate feedback on the first option, especially if there are hurdles I overlooked
1. Minimum changes
Try to work around the current limitations of the Randomizable class an try to establish support with minimum changes:
- Create a special ThreadableCompose transform. This transform could either accept a factory function for its transforms list or use deepcopy to create one set of transforms for each thread it will be used on.
- The random state the per-thread transforms is seeded from a random generator in ThreadableCompose
Pros:
- Requires no change in individual transforms
- Can be tested without backward incompatible changes
Cons:
- Deterministically setting the random state will be a challenge, as the number of threads/their order is not known beforehand
- If transforms need to share information across threads (e.g. a lock for an external resource/library not supporting free-threading), it gets more tricky to implement (but still possible, especially if using a factory method)
Note: Possibly this could also be implemented in a Dataloader, but then the dataloader has strong coupling to the compose/transform it uses.
2. Refactor the Transforms
Refactor the Randomizable interface to enable the use of multiple threads without copying/cloning. This would involve finding a solution for the sample randomization information (what self.randomize() does) and the random state R to support multiple threads calling.
Pros:
- Could enable a cleaner/easier to understand architecture
- Enables migration from deprecated Numpy RandomState
Cons:
- Not backward compatible
- A lot of changes across multiple classes/modules
Thank you for your input and your great work on MONAI!