pai icon indicating copy to clipboard operation
pai copied to clipboard

About NVIDIA's MIG question

Open GuoYingLong opened this issue 4 years ago • 2 comments

openpai supports nvidia's A100 and A30 and MIG functions. After my 100 A100 is split using MIG, the job is still scheduled to the entire A100

openpai version:v1.7.0

GuoYingLong avatar Jul 21 '21 07:07 GuoYingLong

Hi, currently in OpenPAI, the GPU scheduling result is set by GPU ids in NVIDIA_VISIBLE_DEVICES env variable, e.g., NVIDIA_VISIBLE_DEVICES=0,1, and mounted by nvidia container runtime. However, to leverage MIG, it needs each instance's GUID, e.g.. NVIDIA_VISIBLE_DEVICES=MIG-GPU-e86cb44c-6756-fd30-cd4a-1e6da3caf9b0/1/0. It will require a conversion or mapping to support this, so for now, you cannot use MIG to isolate GPU instances in OpenPAI. We may add this feature in the future.

abuccts avatar Jul 21 '21 09:07 abuccts

@abuccts feature wanted too, thanks.

siaimes avatar Oct 12 '21 08:10 siaimes