Training unbalance on different GPUs?
I used 8 gpus to train the model, but most memory is placed on the first GPU and i can not fully utilize other gpus, is threre any solution? thanks!
Hello, good question! I've also faced this problem before. You can try this one: #distribute model on first 7 gpus model = torch.nn.DataParallel(model, device_ids=[0,1,2,3,4,5,6]) images = images.to(device) #send output and label to the last gpu labels = labels.to(device).cuda(7) optimizer.zero_grad() outputs = model(images).cuda(7) #after computing the loss, send loss back to gpu 0 for backpropagation loss = loss_fn(input=outputs, target=labels).cuda(0)
Hello, good question! I've also faced this problem before. You can try this one: #distribute model on first 7 gpus model = torch.nn.DataParallel(model, device_ids=[0,1,2,3,4,5,6]) images = images.to(device) #send output and label to the last gpu labels = labels.to(device).cuda(7) optimizer.zero_grad() outputs = model(images).cuda(7) #after computing the loss, send loss back to gpu 0 for backpropagation loss = loss_fn(input=outputs, target=labels).cuda(0)
I have tried this code, but the memory of the GPU 7 still limits the batch size, and other gpu memory can not be fully utilized, then there is no need to use multi-gpu....