MOSS icon indicating copy to clipboard operation
MOSS copied to clipboard

--deepspeed_multinode_launcher: 未找到命令

Open 168liuliu168 opened this issue 2 years ago • 6 comments

按照官方教程进行训练,启动训练脚本时提示--deepspeed_multinode_launcher: 未找到命令 以下时启动脚本配置: num_machines=4 num_processes=$((num_machines * 8)) machine_rank=0

accelerate launch
--config_file ./configs/sft.yaml
--num_processes $num_processes
--num_machines $num_machines
--machine_rank $machine_rank \ --deepspeed_multinode_launcher standard finetune_moss.py
--model_name_or_path /root/liuliu/moss/moss-moon-003-sft-plugin
--data_dir ./sft_data
--output_dir ./ckpts/moss-moon-003-sft
--log_dir ./train_logs/moss-moon-003-sft
--n_epochs 2
--train_bsz_per_gpu 4
--eval_bsz_per_gpu 4
--learning_rate 0.000015
--eval_step 10
--save_step 10

这样的问题应该怎么解决,已经使用pip 安装了deepspeed

168liuliu168 avatar Apr 25 '23 08:04 168liuliu168

同问,尝试了0.8.3版本的deepspeed有这个报错,然后降级到了0.8.2版本,还是有同样的报错

acadaiaca avatar Apr 25 '23 08:04 acadaiaca

deepspeed_multinode_launcher上一行的换行符后面有个空格要删掉

aichendouble avatar Apr 25 '23 12:04 aichendouble

deepspeed_multinode_launcher上一行的换行符后面有个空格要删掉

yes,可能是复制指令的时候出了问题,谢谢纠正!

xyltt avatar Apr 25 '23 15:04 xyltt

确实,我放到vim里面发现换行符颜色都不对,现在修改可以运行但是会报错端口问题 ValueError: The port number of the rendezvous endpoint 'None:None' must be an integer between 0 and 65536. 这个是需要在哪里进行配置或是指定呢?

168liuliu168 avatar Apr 26 '23 03:04 168liuliu168

确实,我放到vim里面发现换行符颜色都不对,现在修改可以运行但是会报错端口问题 ValueError: The port number of the rendezvous endpoint 'None:None' must be an integer between 0 and 65536. 这个是需要在哪里进行配置或是指定呢?

检查一下 configs/sft.yaml文件中的 main_process_ipmain_process_port 是否被正确指定。

xyltt avatar Apr 26 '23 15:04 xyltt

请问遇到过这个问题吗 The following values were not passed to accelerate launch and had defaults used instead: launch.py:887 --dynamo_backend was set to a value of 'no' To avoid this warning pass in values for each of the problematic parameters or run accelerate config.

hjing100 avatar May 05 '23 12:05 hjing100