BlueRum

MLsy @ AML

Results 37 comments of


                                            BlueRum

[DOC]: optimize readme of ChatGPT

> This [picture](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/chatgpt.png) is from InstructGPT, maybe there should be copyright information? Thanks for your remind, and we mention the reference here.

[DOC]: optimize readme of ChatGPT

> > Hi, I found this [picture](https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chatgpt/experience.jpg) a little confusing. I think if there is some description, it would be much better. > > Yes, at least tell the readers:...

[BUG]: No matching distribution found for transformers>=4.20.1 (from chatgpt==0.1.0)

suggest to use ``` python -m pip install –upgrade pip ``` to update your pip

22919MiB*4 计算资源情况下，torchrun --standalone --nproc_per_node 4 benchmark_gpt_dummy.py --model m --strategy ddp --experience_batch_size 1 --train_batch_size 1 策略OOM。

Thanks for your feedback, you have used DDP strategy which is too naive and costs much more GPU mem. You can try ``` torchrun --standalone --nproc_per_node 4 benchmark_gpt_dummy.py --model m...

[BUG]: ChatGPT Reward Model Eval Distance No Accumulated?

> @ver217 , should the `dist=0` be outside the loop? yep, I will fix soon.

about chatGPT three steps

Thank you for your feedback, and sorry about late reply. And in /applications/ChatGPT)/examples/ ,we have 3 examples : train_dummy -> show the vanilla way to start **training step 3**. train_prompts...

[BUG]: ChatGPT: why total reward is reward = r - kl_coef * kl, not total_reward=r + gamma * critic(next_states)?

Because as we think, the rl training process here is a one-step process, which means there isn't a next_state.

[BUG]: ChatGPT: why total reward is reward = r - kl_coef * kl, not total_reward=r + gamma * critic(next_states)?

I'll close this issue now, please reopen the issue if you have further questions.

train_reward_model loss is random

Thank you for your feedback. We do not suggest to use loss to eval the training process in rm training task. It's shown in paper that the loss will be...

[BUG]: ColossalAI/applications/ChatGPT/examples路径下代码的colossalai模型并行策略报错。

Hi @Qian0733 Thank you for your feedback. But we can't reproduce your bug. It seems like there's something wrong with your env. Can you give us more information about your...

1
2
3
4
›