Doxie issues

Results 6 issues of


                                            Doxie

[BUG] deepspeed-chat bloom training error, raise RuntimeError "still have inflight params " after 14 steps training of step3 with offload option turned on

**Describe the bug** I'm traning a bloom model in step3 using deepspeed-chat, with offload option turned on, after 14 steps training, it raised the following error(see in logs bleow). I...

bug

deepspeed-chat

chinese bloom的默认padding side为什么改成了right

bloom默认的padding side是left，为什么在Chinese bloom系列里面，默认的padding side都改成了right？如果我改回left去训练，会对模型造成影响吗？ ``` { "add_prefix_space": false, "bos_token": "", "clean_up_tokenization_spaces": false, "eos_token": "", "model_max_length": 2048, "pad_token": "", "padding_side": "right", "tokenizer_class": "BloomTokenizer", "unk_token": "" } ``` [chinese_bloom_7b_chat_v3](https://huggingface.co/yuanzhoulvpi/chinese_bloom_7b_chat_v3/tree/main)

chinese_bloom

cannot load reward model from SFT model because of missing keys

I converted a llama model to nemo, with model dirs like below: ![image](https://github.com/NVIDIA/NeMo-Aligner/assets/6756880/2d36915a-a0ab-4c1a-8d20-0960a7948bdc) When I tried to load it to train a reward model, I got missing keys error. I...

bug

stale

关于rm中lm loss计算的疑问

在[reward_trainer.py](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/rm/reward_trainer.py#L147)这里，删除了lm_logits中最后一个token的概率分布，但是在下面的label里面是删除了第一个词，想问下这里是怎么对应的呢 ![image](https://github.com/OpenLMLab/MOSS-RLHF/assets/6756880/fe71ee16-ba46-4797-bce1-3160800976ed)

PPOSFTDataset bug report和相关问题咨询

[ppo_datahelper.py](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/ppo/ppo_datahelper.py#L340)此处代码和对应函数不适配。 ![image](https://github.com/OpenLMLab/MOSS-RLHF/assets/6756880/8ad0f372-20c8-480c-829f-eb026ecff242) 另外想正好咨询一下： 1. 此处应该padding left or right？ 2. llama2默认是padding right，但我看到reward model里的batch数据都是padding left，ppo这里都有很多地方也是padding到left的，具体的padding对齐策略是怎样的呢？ 3. 我发现loss_mask最终会把对应的tokenid改为0，[ppo_trainer.py](https://github.com/OpenLMLab/MOSS-RLHF/blob/main/ppo/ppo_trainer.py#L464) ，然后和模型输出做cross entropy，这里被mask掉的数据，好像依旧会按照label是0而进行梯度回传，能否咨询下这里的具体原理呢？ ![image](https://github.com/OpenLMLab/MOSS-RLHF/assets/6756880/c5b8d922-0bee-4dd5-95bb-589d7f2c3438)

Is there a way to run pytriton on glibc2.32?

I'm running my program on a gpu cluster with dockers. The default docker image has glibc 2.32 installed and its hard to upgrade it to 2.35. Is there any way...