InstanceLoc
InstanceLoc copied to clipboard
training is slow
When running pre-train task on 4 V-100 GPUs, I found that this line of code in shuffle BN takes a lot of time:
idx_shuffle = torch.randperm(batch_size_all).cuda()
In addition,speed of RPN head is also slow.
Do you know what's going on? Look forward to your reply.