RL
RL copied to clipboard
Scalable toolkit for efficient model reinforcement
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...
# What does this PR do ? Updates vllm to 0.11.2, torch to 2.9, transformers to 4.57.1. Also updates Automodel to use `main` branch. # Issues List issues that this...
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** Hi @guyueh1 , as discussed, I'm creating this draft PR for...
Follow up of #1472. Thanks @nv-mmanohara for adding this! 1. Add GRPO support for HelpSteer3 on LlamaNemotron 49B. 2. Add SFT support for tulu3 on LlamaNemotron 49B. 3. Add CodeJaccard...
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...
# What does this PR do ? - Builds upon [#1534] by tracking two additional vLLM metrics:` kv_cache_usage_perc` and `generation_tokens`. - Adds W&B plotting for all vLLM metrics introduced in...
## Title Create a NeMo-RL dedicate branch and bump Automodel to [the branch](https://github.com/NVIDIA-NeMo/Automodel/tree/ruit/nemor-rl-submodule) ## Background / Motivation - Align with upstream Automodel fixes/features to improve compatibility, performance, and stability. -...
# Issues Addresses #833 # Test with thinking machine config # Description This PR is a a work in progress to add LoRA support for the DTensor path. Current status...
# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...
# What does this PR do ? Adds `BasePolicyWorker` class to enforce common APIs between DTensor and Megatron backends # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...