RL icon indicating copy to clipboard operation
RL copied to clipboard

feat: Add guided decoding passthrough to vLLM

Open ybgao-nvidia opened this issue 6 months ago β€’ 3 comments

What does this PR do ?

This PR adds options passthrough to vLLM generation policy to enable guided decoding.

Issues

This PR resolves #603.

Usage

This PR adds a backend agnostic (i.e. does not depend on vLLM should new generation backend is added in the future) guided decoding config class (nemo_rl.models.generation.interfaces.GuidedDecodingConfig).

regex_config = GuidedDecodingConfig(mode="regex", regex=r"\d{3}-\d{3}-\d{4}")
phone_outputs = policy.generate(data, guided_decoding_config=regex_config)

where policy is any subclass of GenerationInterface which includes VllmGeneration.

Before your PR is "Ready for review"

Pre checks:

  • [ ] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [ ] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • [ ] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

  • New Features

    • Added optional guided decoding support for text generation, enabling output constraints via regex patterns, JSON schemas, predefined choices, and grammar rules. This feature is backward compatible and disabled by default.
  • Tests

    • Added unit tests validating guided decoding functionality with regex and choice-based constraints.

ybgao-nvidia avatar Aug 03 '25 21:08 ybgao-nvidia

@parthchadha can you take a quick look as well before merge?

SahilJain314 avatar Aug 11 '25 20:08 SahilJain314

@ybgao-nvidia I dont need to review code :) you can remove me from the list of reviewers. Thank you!

snowmanwwg avatar Sep 15 '25 01:09 snowmanwwg

πŸ“ Walkthrough

Walkthrough

This PR adds guided decoding support to NeMo-RL's vLLM generation pipeline by introducing an optional GuidedDecodingConfig parameter that threads through rollout entry points, generation interfaces, and vLLM workers, enabling structured output modes like regex matching, JSON schema validation, and predefined choice constraints.

Changes

Cohort / File(s) Summary
Rollout Parameter Threading
nemo_rl/experience/rollouts.py
Added guided_decoding_config: Optional[GuidedDecodingConfig] = None parameter to six public methods (generate_responses, generate_responses_async, run_multi_turn_rollout, async_generate_response_for_sample_turn, run_sample_multi_turn_rollout, run_async_multi_turn_rollout) and threaded parameter through internal call chains.
Generation Interface Definitions
nemo_rl/models/generation/interfaces.py
Added new GuidedDecodingConfig TypedDict with fields: mode (str), json (optional), regex (optional), choice (optional), grammar (optional). Extended GenerationConfig with guided_decoding: NotRequired[GuidedDecodingConfig] field. Updated abstract method GenerationInterface.generate() signature to include guided_decoding_config: Optional[GuidedDecodingConfig] parameter.
vLLM Implementation
nemo_rl/models/generation/vllm/vllm_generation.py
Added guided_decoding_config parameter to four public methods (generate, generate_async, generate_text, generate_text_async). Extended _async_generate_base to accept and forward **kwargs. Updated worker invocations to propagate guided decoding configuration through common_kwargs.
vLLM Worker
nemo_rl/models/generation/vllm/vllm_worker.py
Implemented _get_vllm_guided_decoding_params() helper to translate GuidedDecodingConfig into vLLM's GuidedDecodingParams (supports modes: json, regex, choice, grammar, json_object). Updated generate() and generate_text() method signatures to accept guided_decoding_config and integrated conversion logic. Extended _build_sampling_params() to accept and apply guided_decoding_params to SamplingParams.
vLLM Async Worker
nemo_rl/models/generation/vllm/vllm_worker_async.py
Added guided_decoding_config parameter to generate_async() and guided_decoding_params parameter to generate_text_async(). Integrated guided decoding propagation through async per-sample generation paths via _get_vllm_guided_decoding_params() conversion.
Policy Layer
nemo_rl/models/policy/lm_policy.py
Added guided_decoding_config: Optional[GuidedDecodingConfig] = None parameter to generate() method with guard assertion requiring parameter to be None, indicating guided decoding is not supported for this backend.
Unit Test
tests/unit/models/generation/test_vllm_generation.py
Added test_vllm_guided_decoding() test exercising two guided decoding configurations (regex phone-number pattern and predefined-choice mode) and validating output conformance to constraints.

Sequence Diagram

sequenceDiagram
    participant Rollout as Rollout Layer
    participant GenInterface as Generation Interface
    participant VllmGen as VllmGeneration
    participant VllmWorker as VllmWorker
    participant vLLM as vLLM Library

    Rollout->>GenInterface: generate_responses(data, guided_decoding_config)
    GenInterface->>VllmGen: generate(data, guided_decoding_config)
    VllmGen->>VllmWorker: generate(data, guided_decoding_config)
    activate VllmWorker
    VllmWorker->>VllmWorker: _get_vllm_guided_decoding_params(guided_decoding_config)
    VllmWorker->>VllmWorker: _build_sampling_params(..., guided_decoding_params)
    deactivate VllmWorker
    VllmWorker->>vLLM: generate_completion(sampling_params with guided_decoding)
    vLLM-->>VllmWorker: structured output (matches constraints)
    VllmWorker-->>VllmGen: BatchedDataDict
    VllmGen-->>GenInterface: BatchedDataDict
    GenInterface-->>Rollout: BatchedDataDict

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

  • _get_vllm_guided_decoding_params() conversion logic in nemo_rl/models/generation/vllm/vllm_worker.py β€” Verify mode-to-vLLM parameter mapping is complete and handles all supported modes (json, regex, choice, grammar, json_object); ensure ValueError is raised appropriately for unsupported modes.
  • Abstract method contract change in nemo_rl/models/generation/interfaces.py β€” GenerationInterface.generate() signature now requires guided_decoding_config parameter; verify all subclass implementations are properly updated (check for any implementations outside the main files in this diff).
  • Parameter threading consistency β€” Trace guided_decoding propagation across async vs. sync paths (generate vs. generate_async, generate_text vs. generate_text_async) to ensure no divergence in parameter passing.
  • Guard assertion in lm_policy.py β€” Confirm the assertion message and behavior are appropriate for blocking guided decoding on non-vLLM backends.

Possibly related PRs

  • NVIDIA-NeMo/RL#1098 β€” Async GRPO code calls rollout functions like run_async_multi_turn_rollout() which now accept guided decoding config; may need coordination for integrated testing.
  • NVIDIA-NeMo/RL#1382 β€” Also modifies nemo_rl/models/generation/interfaces.py to update GenerationConfig fields; potential merge conflict or cross-feature interaction at the type level.

Suggested labels

CI:L1

Suggested reviewers

  • terrykong
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR introduces a major feature (guided decoding support for vLLM) and includes a comprehensive unit test that exercises regex and choice-based guided decoding modes with assertions about output shapes and constraint adherence. However, the PR description indicates that pre-check checklist items are unchecked, and there is no explicit documentation of test execution results or confirmation that the tests pass. While the test code exists and appears well-designed, the lack of documented test results in the PR description means the requirement for major changes to include test result information has not been satisfied. Additionally, there is an outstanding review comment requesting improved validation for guided decoding configuration fields. Update the PR description to explicitly document that tests have been executed and pass. Include test output or a reference to CI/workflow results demonstrating that test_vllm_guided_decoding passes successfully. Additionally, address the outstanding review comment regarding field validation for guided decoding modes to ensure proper error handling with descriptive ValueError messages instead of bare KeyError exceptions. Mark all pre-check checklist items as complete once these steps are finished.
βœ… Passed checks (4 passed)
Check name Status Explanation
Description Check βœ… Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check βœ… Passed The title "feat: Add guided decoding passthrough to vLLM" accurately and specifically describes the primary change in this pull request. It clearly conveys that the feature adds guided decoding support through the vLLM generation pipeline without using vague terms or noise. The title is concise at 7 words and effectively communicates the main objective to someone scanning the project history.
Linked Issues Check βœ… Passed The pull request fully satisfies the requirements from linked issue #603. The implementation adds support for guided decoding parameters (json, regex, choice, grammar modes) through a new GuidedDecodingConfig interface class [interfaces.py], properly threads this configuration through vLLM generation entry points [vllm_generation.py], implements the core translation logic via _get_vllm_guided_decoding_params() helper that passes guided decoding to vLLM's SamplingParams exactly as specified in the issue [vllm_worker.py], extends async paths for completeness [vllm_worker_async.py], and validates the implementation with a new test exercising regex and choice-based guided decoding [test_vllm_generation.py]. All coding requirements for enabling structured output and tool calling through guided decoding have been met.
Out of Scope Changes Check βœ… Passed All changes in this pull request are directly related to implementing guided decoding support for vLLM. The new interface definitions [interfaces.py] and backend-specific implementations [vllm_generation.py, vllm_worker.py, vllm_worker_async.py] are core to the feature. The threading through rollouts [rollouts.py] and consistent interface updates across all backends including the guard assertion in lm_policy.py are supporting changes that align with the PR objective to provide end-to-end guided decoding capability. The test addition validates the implementation. No extraneous changes unrelated to guided decoding support were introduced.
✨ Finishing touches
  • [ ] πŸ“ Generate docstrings
πŸ§ͺ Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch ybgao/aug3-guided-decoding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 30 '25 20:10 coderabbitai[bot]