可爱的楼主您好~我单卡训练没问题,多卡训练报错
Traceback (most recent call last):
File "trainval_net.py", line 479, in
rois_label = _RCNN(im_data, im_info, gt_boxes, num_boxes,
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.gather(outputs, self.output_device)
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
res = gather_map(outputs)
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/xinjianli/anaconda3/envs/torch12/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration
#!/bin/bash
set -e
cd ..
CUDA_VISIBLE_DEVICES=18,19,20,21 python trainval_net.py --dataset pascal_voc_0712 --net snet_146 --bs 256 --nw 16
--lr 1e-2 --epochs 150 --cuda --lr_decay_step 25,50,75 --use_tfboard True
--save_dir snet146 --eval_interval 2 --logdir snet146_log --pre ./weights/snet_146.tar
--checkepoch 2 --mgpus
下面的是改的您的train_146.sh,只修改了GPU和bs,单卡跑起来没问题,多卡就报错
下面的是改的您的train_146.sh,只修改了GPU和bs,单卡跑起来没问题,多卡就报错
您好,请问您有解决这个问题吗?