RL icon indicating copy to clipboard operation
RL copied to clipboard

Scalable toolkit for efficient model reinforcement

Results 100 RL issues
Sort by recently updated
recently updated
newest added

# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...

CI:L1

# What does this PR do ? Updates vllm to 0.11.2, torch to 2.9, transformers to 4.57.1. Also updates Automodel to use `main` branch. # Issues List issues that this...

CI:L1

# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** Hi @guyueh1 , as discussed, I'm creating this draft PR for...

CI:L0
Low Precision

Follow up of #1472. Thanks @nv-mmanohara for adding this! 1. Add GRPO support for HelpSteer3 on LlamaNemotron 49B. 2. Add SFT support for tulu3 on LlamaNemotron 49B. 3. Add CodeJaccard...

documentation
CI:L1

# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...

# What does this PR do ? - Builds upon [#1534] by tracking two additional vLLM metrics:` kv_cache_usage_perc` and `generation_tokens`. - Adds W&B plotting for all vLLM metrics introduced in...

CI:L1
ease of use

## Title Create a NeMo-RL dedicate branch and bump Automodel to [the branch](https://github.com/NVIDIA-NeMo/Automodel/tree/ruit/nemor-rl-submodule) ## Background / Motivation - Align with upstream Automodel fixes/features to improve compatibility, performance, and stability. -...

enhancement

# Issues Addresses #833 # Test with thinking machine config # Description This PR is a a work in progress to add LoRA support for the DTensor path. Current status...

CI:L1

# What does this PR do ? **Add a one line overview of what this PR aims to accomplish.** # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...

CI:L1

# What does this PR do ? Adds `BasePolicyWorker` class to enforce common APIs between DTensor and Megatron backends # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): # Usage...

documentation
CI