luckmoon

Results 5 issues of luckmoon

The Segmentation fault can be fixed by changing to GCC 4.9. Maybe it should be pointed in project.

看之前的issue,增量预训练时,可以加载chat模型,比如采用数据集tianlongbabu.txt,无法构造成问答的形式,怎么解决呢?用非问答形式的数据,会不会与chat模型的问答的使用方式有所冲突?

question

If I run like this: ```shell ray job submit --address="http://127.0.0.1:8265" \ --runtime-env-json='{"working_dir": "/openrlhf"}' \ -- python3 -m openrlhf.cli.train_ppo_ray \ --ref_num_nodes 1 \ --ref_num_gpus_per_node 8 \ --reward_num_nodes 1 \ --reward_num_gpus_per_node 8...

https://github.com/OpenRLHF/OpenRLHF/blob/db77a3019bab862747e9591001d0b3744681e079/openrlhf/trainer/ray/ppo_actor.py#L317-L320 这里为了key=kl时去reduce,但同时把其他key也做了reduce,是否符合预期?