What does this PR do ?

This PR adds options passthrough to vLLM generation policy to enable guided decoding.

Issues

This PR resolves #603.

Usage

This PR adds a backend agnostic (i.e. does not depend on vLLM should new generation backend is added in the future) guided decoding config class (nemo_rl.models.generation.interfaces.GuidedDecodingConfig).

regex_config = GuidedDecodingConfig(mode="regex", regex=r"\d{3}-\d{3}-\d{4}")
phone_outputs = policy.generate(data, guided_decoding_config=regex_config)

where policy is any subclass of GenerationInterface which includes VllmGeneration.

Before your PR is "Ready for review"

Pre checks:

[ ] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
[ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

New Features
- Added optional guided decoding support for text generation, enabling output constraints via regex patterns, JSON schemas, predefined choices, and grammar rules. This feature is backward compatible and disabled by default.
Tests
- Added unit tests validating guided decoding functionality with regex and choice-based constraints.

Aug 03 '25 21:08 ybgao-nvidia

@parthchadha can you take a quick look as well before merge?

Aug 11 '25 20:08 SahilJain314

@ybgao-nvidia I dont need to review code :) you can remove me from the list of reviewers. Thank you!

Sep 15 '25 01:09 snowmanwwg

📝 Walkthrough

Walkthrough

This PR adds guided decoding support to NeMo-RL's vLLM generation pipeline by introducing an optional GuidedDecodingConfig parameter that threads through rollout entry points, generation interfaces, and vLLM workers, enabling structured output modes like regex matching, JSON schema validation, and predefined choice constraints.

Changes

Cohort / File(s)	Summary
Rollout Parameter Threading `nemo_rl/experience/rollouts.py`	Added `guided_decoding_config: Optional[GuidedDecodingConfig] = None` parameter to six public methods (`generate_responses`, `generate_responses_async`, `run_multi_turn_rollout`, `async_generate_response_for_sample_turn`, `run_sample_multi_turn_rollout`, `run_async_multi_turn_rollout`) and threaded parameter through internal call chains.
Generation Interface Definitions `nemo_rl/models/generation/interfaces.py`	Added new `GuidedDecodingConfig` TypedDict with fields: `mode` (str), `json` (optional), `regex` (optional), `choice` (optional), `grammar` (optional). Extended `GenerationConfig` with `guided_decoding: NotRequired[GuidedDecodingConfig]` field. Updated abstract method `GenerationInterface.generate()` signature to include `guided_decoding_config: Optional[GuidedDecodingConfig]` parameter.
vLLM Implementation `nemo_rl/models/generation/vllm/vllm_generation.py`	Added `guided_decoding_config` parameter to four public methods (`generate`, `generate_async`, `generate_text`, `generate_text_async`). Extended `_async_generate_base` to accept and forward `**kwargs`. Updated worker invocations to propagate guided decoding configuration through `common_kwargs`.
vLLM Worker `nemo_rl/models/generation/vllm/vllm_worker.py`	Implemented `_get_vllm_guided_decoding_params()` helper to translate `GuidedDecodingConfig` into vLLM's `GuidedDecodingParams` (supports modes: json, regex, choice, grammar, json_object). Updated `generate()` and `generate_text()` method signatures to accept `guided_decoding_config` and integrated conversion logic. Extended `_build_sampling_params()` to accept and apply `guided_decoding_params` to `SamplingParams`.
vLLM Async Worker `nemo_rl/models/generation/vllm/vllm_worker_async.py`	Added `guided_decoding_config` parameter to `generate_async()` and `guided_decoding_params` parameter to `generate_text_async()`. Integrated guided decoding propagation through async per-sample generation paths via `_get_vllm_guided_decoding_params()` conversion.
Policy Layer `nemo_rl/models/policy/lm_policy.py`	Added `guided_decoding_config: Optional[GuidedDecodingConfig] = None` parameter to `generate()` method with guard assertion requiring parameter to be `None`, indicating guided decoding is not supported for this backend.
Unit Test `tests/unit/models/generation/test_vllm_generation.py`	Added `test_vllm_guided_decoding()` test exercising two guided decoding configurations (regex phone-number pattern and predefined-choice mode) and validating output conformance to constraints.

Sequence Diagram

sequenceDiagram
    participant Rollout as Rollout Layer
    participant GenInterface as Generation Interface
    participant VllmGen as VllmGeneration
    participant VllmWorker as VllmWorker
    participant vLLM as vLLM Library

    Rollout->>GenInterface: generate_responses(data, guided_decoding_config)
    GenInterface->>VllmGen: generate(data, guided_decoding_config)
    VllmGen->>VllmWorker: generate(data, guided_decoding_config)
    activate VllmWorker
    VllmWorker->>VllmWorker: _get_vllm_guided_decoding_params(guided_decoding_config)
    VllmWorker->>VllmWorker: _build_sampling_params(..., guided_decoding_params)
    deactivate VllmWorker
    VllmWorker->>vLLM: generate_completion(sampling_params with guided_decoding)
    vLLM-->>VllmWorker: structured output (matches constraints)
    VllmWorker-->>VllmGen: BatchedDataDict
    VllmGen-->>GenInterface: BatchedDataDict
    GenInterface-->>Rollout: BatchedDataDict

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

_get_vllm_guided_decoding_params() conversion logic in nemo_rl/models/generation/vllm/vllm_worker.py — Verify mode-to-vLLM parameter mapping is complete and handles all supported modes (json, regex, choice, grammar, json_object); ensure ValueError is raised appropriately for unsupported modes.
Abstract method contract change in nemo_rl/models/generation/interfaces.py — GenerationInterface.generate() signature now requires guided_decoding_config parameter; verify all subclass implementations are properly updated (check for any implementations outside the main files in this diff).
Parameter threading consistency — Trace guided_decoding propagation across async vs. sync paths (generate vs. generate_async, generate_text vs. generate_text_async) to ensure no divergence in parameter passing.
Guard assertion in lm_policy.py — Confirm the assertion message and behavior are appropriate for blocking guided decoding on non-vLLM backends.

Possibly related PRs

NVIDIA-NeMo/RL#1098 — Async GRPO code calls rollout functions like run_async_multi_turn_rollout() which now accept guided decoding config; may need coordination for integrated testing.
NVIDIA-NeMo/RL#1382 — Also modifies nemo_rl/models/generation/interfaces.py to update GenerationConfig fields; potential merge conflict or cross-feature interaction at the type level.

Suggested labels

CI:L1

Suggested reviewers

terrykong
parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	This PR introduces a major feature (guided decoding support for vLLM) and includes a comprehensive unit test that exercises regex and choice-based guided decoding modes with assertions about output shapes and constraint adherence. However, the PR description indicates that pre-check checklist items are unchecked, and there is no explicit documentation of test execution results or confirmation that the tests pass. While the test code exists and appears well-designed, the lack of documented test results in the PR description means the requirement for major changes to include test result information has not been satisfied. Additionally, there is an outstanding review comment requesting improved validation for guided decoding configuration fields.	Update the PR description to explicitly document that tests have been executed and pass. Include test output or a reference to CI/workflow results demonstrating that `test_vllm_guided_decoding` passes successfully. Additionally, address the outstanding review comment regarding field validation for guided decoding modes to ensure proper error handling with descriptive ValueError messages instead of bare KeyError exceptions. Mark all pre-check checklist items as complete once these steps are finished.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "feat: Add guided decoding passthrough to vLLM" accurately and specifically describes the primary change in this pull request. It clearly conveys that the feature adds guided decoding support through the vLLM generation pipeline without using vague terms or noise. The title is concise at 7 words and effectively communicates the main objective to someone scanning the project history.
Linked Issues Check	✅ Passed	The pull request fully satisfies the requirements from linked issue #603. The implementation adds support for guided decoding parameters (json, regex, choice, grammar modes) through a new `GuidedDecodingConfig` interface class [interfaces.py], properly threads this configuration through vLLM generation entry points [vllm_generation.py], implements the core translation logic via `_get_vllm_guided_decoding_params()` helper that passes guided decoding to vLLM's `SamplingParams` exactly as specified in the issue [vllm_worker.py], extends async paths for completeness [vllm_worker_async.py], and validates the implementation with a new test exercising regex and choice-based guided decoding [test_vllm_generation.py]. All coding requirements for enabling structured output and tool calling through guided decoding have been met.
Out of Scope Changes Check	✅ Passed	All changes in this pull request are directly related to implementing guided decoding support for vLLM. The new interface definitions [interfaces.py] and backend-specific implementations [vllm_generation.py, vllm_worker.py, vllm_worker_async.py] are core to the feature. The threading through rollouts [rollouts.py] and consistent interface updates across all backends including the guard assertion in lm_policy.py are supporting changes that align with the PR objective to provide end-to-end guided decoding capability. The test addition validates the implementation. No extraneous changes unrelated to guided decoding support were introduced.

✨ Finishing touches

[ ] 📝 Generate docstrings

🧪 Generate unit tests (beta)

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment
[ ] Commit unit tests in branch ybgao/aug3-guided-decoding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Oct 30 '25 20:10 coderabbitai[bot]

feat: Add guided decoding passthrough to vLLM

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches