benchmark
benchmark copied to clipboard
transformer 多进程单卡下报错
https://github.com/PaddlePaddle/benchmark/blob/master/NeuralMachineTranslation/Transformer/fluid/train/train.py#L616
2019-05-21 09:29:28,729-INFO: Namespace(batch_size=4096, device='GPU', enable_ce=True, fetch_steps=100, local=True, opts=['dropout_seed', '10', 'learning_rate', '2.0', 'warmup_steps', '8000', 'beta2', '0.997', 'd_model', '512', 'd_inner_hid', '2048', 'n_head', '8', 'prepostprocess_dropout', '0.1', 'attention_dropout', '0.1', 'relu_dropout', '0.1', 'weight_sharing', 'True', 'pass_num', '1', 'model_dir', 'tmp_models', 'ckpt_dir', 'tmp_ckpts'], pool_size=200000, shuffle=False, shuffle_batch=False, sort_type='pool', special_token=['<s>', '<e>', '<unk>'], src_vocab_fpath='data/vocab.bpe.32000', sync=True, token_delimiter=' ', train_file_pattern='data/train.tok.clean.bpe.32000.en-de', trg_vocab_fpath='data/vocab.bpe.32000', update_method='pserver', use_default_pe=False, use_mem_opt=True, use_py_reader=True, use_token_batch=True, val_file_pattern=None)
Traceback (most recent call last):
File "train.py", line 784, in <module>
train(args)
File "train.py", line 641, in train
dev_count = get_device_num()
File "train.py", line 616, in get_device_num
device_num = subprocess.check_output(['nvidia-smi','-L']).decode().count('\n')
NameError: global name 'subprocess' is not defined
@ccmeteorljh 为什么是多进程单卡? 没有设置环境变量(CUDA_VISIBLE_DEVICES)?
@ccmeteorljh 为什么是多进程单卡? 没有设置环境变量(CUDA_VISIBLE_DEVICES)?
设置了,想试试多进程模式下单卡和单进程单卡下的速度对比如何,上面那个问题import一下就可以了
Traceback (most recent call last):
File "train.py", line 785, in <module>
train(args)
File "train.py", line 703, in train
token_num, predict, pyreader)
File "train.py", line 534, in train_loop
feed=feed_dict_list)
File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 286, in run
return_numpy=return_numpy)
File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 640, in run
return_numpy=return_numpy)
File "/opt/python/cp27-cp27mu/lib/python2.7/site-packages/paddle/fluid/executor.py", line 482, in _run_parallel
"Feed a list of tensor, the list should be the same size as places"
ValueError: Feed a list of tensor, the list should be the same size as places
设置了,想试试多进程模式下单卡和单进程单卡下的速度对比如何,上面那个问题import一下就可以了
老哥,你怎么解决的,求教,我也出现同样的问题
@QianShengWu 目前还不支持多进程单卡模式