verl
verl copied to clipboard
Example for code RL training
Very nice library!
I noticed that the current examples are for math task training. Would you consider adding an example for code generation tasks, including some recommended settings?
Moreover, while currently attempting some simple training for code generation, I found that the training speed is significantly slower compared to math tasks, and GPU utilization is often very low. Can you provide some possible suggestions?
+1
+1