contrib
contrib copied to clipboard
Memory problem
In distributed training, the memory of the first GPU is twice that of the other.But before the swa is applied, the GPU memory is the same.