yyht
yyht
Hi, nice work. When I apply it to shallower bert or gpt, after initialization, it often get NAN gradients(even for deeper architecture).
Hi, I have done pretraining on Chinese-dataset(50G) and run downstream finetuning on ChineseClue benchmark, the default hyperparameters ars the same to bert-base: learning_rate: 3e-5, epoch: 3 or 5 the finetuning...
hi, I am trying to do traing bert-base using tta for chinese, it got NAN with 1000-step optimization, I am wondering if you could give me some advice
hi, since you waveglow propose to use a soft-em version of vqvae, the core implementation is: " def _square_distance(x, code_book): x = tf.cast(x, tf.float32) code_book = tf.cast(code_book, tf.float32) x_sg =...
hi, i am very interested in your paper. I tried to do experiment on my own Finance News dataset to predict finicial event type given the finicial news but, the...
Hi, this work is very useful for my research. Could u share me the helpfulness dataset with human label? thanks
### Your current environment ``` ENV: openrlhf-latest version vllm==0.7.2 ``` ### 🐛 Describe the bug I tried to run ppo/reinforce++ using openrlhf. The dataset and reward-func is same to https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/...
Nice work! Could you release swe-enviroment and rl recipe for online-RL-training? I think this is far more important than sft.