yyht issues

Results 9 issues of


                                            yyht

when apply rezero to bert or gpt, get NAN gradients

Hi, nice work. When I apply it to shallower bert or gpt, after initialization, it often get NAN gradients(even for deeper architecture).

I have done some experiments on Chinese using bert-base config, the results are not promising

Hi, I have done pretraining on Chinese-dataset(50G) and run downstream finetuning on ChineseClue benchmark, the default hyperparameters ars the same to bert-base: learning_rate: 3e-5, epoch: 3 or 5 the finetuning...

when train tta with bert-base config and sequence length 512,got NAN

hi, I am trying to do traing bert-base using tta for chinese, it got NAN with 1000-step optimization, I am wondering if you could give me some advice

some confusions about soft-em

hi, since you waveglow propose to use a soft-em version of vqvae, the core implementation is: " def _square_distance(x, code_book): x = tf.cast(x, tf.float32) code_book = tf.cast(code_book, tf.float32) x_sg =...

try to do experiment on 70-classes dataset and the model doesn't converge

hi, i am very interested in your paper. I tried to do experiment on my own Finance News dataset to predict finicial event type given the finicial news but, the...

could u provide the helpfulness dataset with human label?

Hi, this work is very useful for my research. Could u share me the helpfulness dataset with human label? thanks

这个数据能share一下么

[Bug]: online-rl sampling is different from offline-sampling

### Your current environment ``` ENV: openrlhf-latest version vllm==0.7.2 ``` ### 🐛 Describe the bug I tried to run ppo/reinforce++ using openrlhf. The dataset and reward-func is same to https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/...

bug

please release swe-enviroment

Nice work! Could you release swe-enviroment and rl recipe for online-RL-training? I think this is far more important than sft.