Including nccl multi-gpu library to PyCUDA
Hello there,
I want to wrap NVIDIA's nccl library for multi-gpu communication collectives into Python, so as to be compatible with PyCUDA framework.
I thought that it would be appropriate to include this into PyCUDA itself, because the library refers to gpu-gpu communication and not to CUDA implementations of algorithms. What are your thoughts about this?
In case you are interested, is there already an effort going on? What should I notice before I start implementing this?
Also, I could not find a developer-related documentation for extending PyCUDA functionalities. Where should I start from?
Could you please send a message to the mailing list about this? It's likely that you'll get better feedback there.
One possible issue is that this uses the runtime (not driver) API, which is always a bit awkward.