gpu4pyscf icon indicating copy to clipboard operation
gpu4pyscf copied to clipboard

Choose which GPU to use on a multi-GPU system

Open tvogels opened this issue 7 months ago • 5 comments

When you import gpu4pyscf, it creates CUDA streams on all visible CUDA devices.

https://github.com/pyscf/gpu4pyscf/blob/f805329f11dcda25c27e1eedbb1bbb1890e45327/gpu4pyscf/config.py#L20-L23

There are cases where this is undesirable, for example when a multiple copies of a program are launched with MPI on one node, and each copy is meant to access only one device.

How can we make this configurable? Using CUDA_VISIBLE_DEVICES is not an option for me, because the same process that uses gpu4pyscf also uses other GPUs for other purposes.

tvogels avatar Jun 23 '25 09:06 tvogels

@tvogels Thank you for raising the issue. The multi-GPU feature was designed to use all the available GPUs. A configurable device list, or allowing user to turn off multi-GPU indeed introduces the flexibility of controlling GPUs. I will label this as feature request first.

Alternatively, you can launch a subprocess in python, and use CUDA_VISIBLE_DEVICES to control the execution of gpu4pyscf task. It probably does not help, if there are other limitations.

wxj6000 avatar Jun 23 '25 19:06 wxj6000

Hi @wxj6000, thanks! Yeah that makes sense.

I’d be happy to contribute this feature, but I don’t see an easy way to do this while maintaining compatibility with the current behavior. Let me know if you have a good idea.

tvogels avatar Jun 24 '25 07:06 tvogels

Here is the simplest solution I can come up with. We can introduce a list called active_device_ids: a list of active device id's

Then, replace all the

for device_id in range(num_devices):

with

for device_id in active_device_ids

The default active_device_ids = range(num_devices). If needed, people can modify the list in-place.

wxj6000 avatar Jun 25 '25 01:06 wxj6000

Hmm, but this code runs at import time. How would you set the list of active devices so early?

tvogels avatar Jun 25 '25 04:06 tvogels

Hmm, but this code runs at import time. How would you set the list of active devices so early?

Yes, the list of active devices will be in RAM when imported. The values can be updated later with a reference of active_device_ids.

wxj6000 avatar Jun 25 '25 04:06 wxj6000