DenceChen
DenceChen
 anyone got this error?
ths your seq2seq framework , help me get a lot prize and work
### Reminder - [x] I have read the above rules and searched the existing issues. ### Description sft微调带思考模型的时候如何调整loss的权重,比如output由思考部分+label部分构成,我想训练的时候思考loss权重轻一些比如占比0.8,label部分占比1.0 ### Pull Request _No response_
执行train_grpo_llama_ray.sh 报错: (zyp_llms) root@VM-8-181-tencentos:~/dence/OpenRLHF# ./examples/scripts/train_grpo_llama_ray.sh ++ ray job submit --address=http://127.0.0.1:8265 '--runtime-env-json={"working_dir": "./openrlhf"}' -- python3 -m openrlhf.cli.train_ppo_ray --ref_num_nodes 1 --ref_num_gpus_per_node 1 --reward_num_nodes 1 --reward_num_gpus_per_node 1 --actor_num_nodes 1 --actor_num_gpus_per_node 4 --vllm_num_engines 2...