Doxie
Doxie
**Describe the bug** I'm traning a bloom model in step3 using deepspeed-chat, with offload option turned on, after 14 steps training, it raised the following error(see in logs bleow). I...
bloom默认的padding side是left,为什么在Chinese bloom系列里面,默认的padding side都改成了right?如果我改回left去训练,会对模型造成影响吗? ``` { "add_prefix_space": false, "bos_token": "", "clean_up_tokenization_spaces": false, "eos_token": "", "model_max_length": 2048, "pad_token": "", "padding_side": "right", "tokenizer_class": "BloomTokenizer", "unk_token": "" } ``` [chinese_bloom_7b_chat_v3](https://huggingface.co/yuanzhoulvpi/chinese_bloom_7b_chat_v3/tree/main)
I converted a llama model to nemo, with model dirs like below:  When I tried to load it to train a reward model, I got missing keys error. I...
在[reward_trainer.py](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/rm/reward_trainer.py#L147)这里,删除了lm_logits中最后一个token的概率分布,但是在下面的label里面是删除了第一个词,想问下这里是怎么对应的呢 
[ppo_datahelper.py](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/ppo/ppo_datahelper.py#L340)此处代码和对应函数不适配。  另外想正好咨询一下: 1. 此处应该padding left or right? 2. llama2默认是padding right,但我看到reward model里的batch数据都是padding left,ppo这里都有很多地方也是padding到left的,具体的padding对齐策略是怎样的呢? 3. 我发现loss_mask最终会把对应的tokenid改为0,[ppo_trainer.py](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/ppo/ppo_trainer.py#L464) ,然后和模型输出做cross entropy,这里被mask掉的数据,好像依旧会按照label是0而进行梯度回传,能否咨询下这里的具体原理呢? 
I'm running my program on a gpu cluster with dockers. The default docker image has glibc 2.32 installed and its hard to upgrade it to 2.35. Is there any way...