ssd-pytorch 多卡训练时报错

Traceback (most recent call last): File "train.py", line 107, in <module> out = net(images) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.gather(outputs, self.output_device) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in gather return gather(outputs, output_device, dim=self.dim) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/walker2/anaconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in forward assert all(map(lambda i: i.is_cuda, inputs)) AssertionError 在train.py里添加 import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" 使用单卡时训练正常

Sep 07 '20 01:09 hlcool

我没看出来你多卡训练的报错是为啥…不过我试过多卡训练没啥问题…是不是没配置好？

Sep 12 '20 04:09 bubbliiiing

我也遇到了一模一样的问题，解决了吗

Sep 23 '20 08:09 sssss99999

我的也是只能在一张卡上训练，其余的卡并不能并行运行

Mar 10 '21 07:03 frandooo

SSD的forward中返回了在cpu上的的prior，这个在SSD的forward中没有作用，就是定义然后返回了，导致了问题。将这个prior的定义在MultiBoxLoss中就可以使用多GPU训练了 class MultiBoxLoss(nn.Module): def __init__(self, num_classes, overlap_thresh, prior_for_matching,bkg_label, neg_mining, neg_pos, neg_overlap, encode_target, gpu_num, negatives_for_hard=100.0): super(MultiBoxLoss, self).__init__() self.gpu_num = gpu_num self.num_classes = num_classes self.threshold = overlap_thresh self.background_label = bkg_label self.encode_target = encode_target self.use_prior_for_matching = prior_for_matching self.do_neg_mining = neg_mining self.negpos_ratio = neg_pos self.neg_overlap = neg_overlap self.negatives_for_hard = negatives_for_hard self.variance = [0.1,0.2] **with torch.no_grad():** **self.priors = Variable(PriorBox().forward())**

Mar 20 '21 14:03 dbwzyh

为啥这个prior会导致不能用多卡啊，因为cpu嘛

Mar 25 '21 11:03 bubbliiiing

可能是多卡训练是在GPU上，但是prior是定义在cpu上的，可能没法分配到多个GPU上

Mar 26 '21 07:03 frandooo

0 0 竟然是如此，需要找个时间改。我太忙了

Mar 30 '21 14:03 bubbliiiing