ZouRuia
ZouRuia
tools.py里 我用cpu进行训练,n_gpu=‘’, def prepare_device(use_gpu): """ setup GPU device if available, move model into configured device # 如果n_gpu_use为数字,则使用range生成list # 如果输入的是一个list,则默认使用list[0]作为controller Example: use_gpu = '' : cpu use_gpu = '0': cuda:0 use_gpu...
when i use bash examples/train_iwslt14.sh /u01/zourui/unilm/deltalm/tmp/iwslt14/iwslt14.bin /u01/zourui/unilm/deltalm/tmp/iwslt14/checkpoints /u01/zourui/unilm/deltalm/checkpoint/deltalm-base.pt have a problem. + data_bin=/u01/zourui/unilm/deltalm/tmp/iwslt14/iwslt14.bin + save_dir=/u01/zourui/unilm/deltalm/tmp/iwslt14/checkpoints + PRETRAINED_MODEL=/u01/zourui/unilm/deltalm/checkpoint/deltalm-base.pt + python train.py /u01/zourui/unilm/deltalm/tmp/iwslt14/iwslt14.bin --save-dir /u01/zourui/unilm/deltalm/tmp/iwslt14/checkpoints --arch deltalm_base --pretrained-deltalm-checkpoint /u01/zourui/unilm/deltalm/checkpoint/deltalm-base.pt --share-all-embeddings --max-source-positions 128...
我用三块卡训练得时候会出现这个错,然后我去查了一圈,发现有一个四块卡报[RuntimeError: Input tensor at index 3 has invalid shape [2, 2, 16, 128, 64] but expected [2, 4, 16, 128, 64]](https://stackoverflow.com/questions/65822014/runtimeerror-input-tensor-at-index-3-has-invalid-shape-2-2-16-128-64-but)的,然后我就又改回了四块卡训练,然后就很奇怪的跑通了。。但是不知道为什么。。 args: Namespace(batch_size=8, device='5,6,1,4', epochs=5, fp16=False, fp16_opt_level='O1', gradient_accumulation=1, log_step=1, lr=0.00015,...