tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[TIR][Schedule] FuseReductionEpilogue: Add Clipping pattern support

Open kimm240 opened this issue 2 months ago • 4 comments

Currently, the FuseReductionEpilogue primitive only supports Bias (addition) and BiasReLU (addition + ReLU) epilogue patterns. However, clipping operations (min(max(x, lower), upper)) are commonly used in deep learning models and would benefit from the same fusion optimization.

This commit extends FuseReductionEpilogue to support Clipping patterns by:

  1. Adding EpilogueType::Clipping to the enum to distinguish clipping patterns from other epilogue types.

  2. Adding clipping_lower_ and clipping_upper_ members to ReductionEpilogueFuser to store clipping bounds extracted from the epilogue pattern.

  3. Extending AnalyzeEpiloguePattern to detect clipping patterns:

    • min(max(temp, lower), upper)
    • max(min(temp, upper), lower)
    • All commutative variants of min/max at each level
  4. Updating BiasReLU pattern matching to handle max(0, x) form in addition to max(x, 0) for better commutativity support.

  5. Modifying CreateFusedReductionBlock to apply clipping to the init value: init = min(max(0, lower), upper)

  6. Updating BufferReplacer to apply clipping per-iteration: value = min(max(value, lower), upper)

  7. Adding validation in BodyPatternAllowFusion to ensure temp appears exactly once in clipping patterns.

  8. Creating comprehensive test coverage with 8 test cases:

    • Basic fusion test
    • Numerical correctness verification
    • Multiple epilogue blocks test
    • 5 commutative variant tests

This implementation follows the same per-iteration semantics as BiasReLU, where clipping is applied at each reduction step rather than post-reduction. This semantic change is documented in the docstring with a warning about potential numerical differences.

The test suite verifies that all commutative forms of clipping patterns are correctly recognized and that the fused implementation produces numerically identical results to the per-iteration reference implementation.

kimm240 avatar Nov 27 '25 01:11 kimm240

Summary of Changes

Hello @kimm240, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the FuseReductionEpilogue primitive by introducing support for Clipping operations. Previously limited to Bias and BiasReLU, this change expands the optimization's applicability to common deep learning patterns involving value clamping. A key aspect of this update is the adoption of per-iteration semantics for non-linear epilogues, which is a deliberate design choice with documented implications for numerical results. The implementation includes robust pattern matching for commutative variants and comprehensive testing to ensure correctness and stability.

Highlights

  • Clipping Pattern Support: The FuseReductionEpilogue primitive now supports Clipping patterns, specifically min(max(x, lower), upper) and its commutative variants, allowing for fusion of clipping operations into reduction blocks.
  • Per-Iteration Semantics for Non-Linear Epilogues: For non-linear epilogues like BiasReLU and the newly added Clipping, the fusion changes the computation semantics from post-reduction application to per-iteration application. This is explicitly documented and verified by tests, acknowledging potential numerical differences.
  • Enhanced Pattern Matching: The AnalyzeEpiloguePattern function has been extended to detect all commutative variants of min/max for clipping and to handle max(0, x) forms for BiasReLU, improving robustness.
  • Updated Reduction Block Creation: The CreateFusedReductionBlock and BufferReplacer logic have been updated to correctly apply the BiasReLU and Clipping operations to the initial value and per-iteration updates within the fused reduction block.
  • Comprehensive Test Coverage: New test cases have been added to validate the Clipping pattern fusion, including basic fusion, numerical correctness (matching per-iteration reference), handling of multiple epilogue blocks, and recognition of various commutative forms.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Nov 27 '25 01:11 gemini-code-assist[bot]

@wrongtest-intellif This PR implements the extension for other epilogue forms (ReLU/Clipping) as discussed in the previous PR #18418 review conversation.

kimm240 avatar Nov 27 '25 09:11 kimm240

@wrongtest-intellif This PR implements the extension for other epilogue forms (ReLU/Clipping) as discussed in the previous PR #18418 review conversation.

Thanks for the contribution! The change looks good to me itself. My question is now do we have any more insights for general epilogue forms? For reverse_compute_inline primitive, it seems many other relu/prelu/sigmoid etc forms are available without explicit pattern matching.

wrongtest-intellif avatar Dec 10 '25 05:12 wrongtest-intellif

What you say is definitely good direction! In compute_inline.cc, usual data flow reconnection method is used to handle complex operations using Index mapping with DetectInterMap and substitution logic after BufferLoad extraction. It would be a good reference to this project's future. I'd like to ask for advice.

kimm240 avatar Dec 10 '25 08:12 kimm240

What you say is definitely good direction! In compute_inline.cc, usual data flow reconnection method is used to handle complex operations using Index mapping with DetectInterMap and substitution logic after BufferLoad extraction. It would be a good reference to this project's future. I'd like to ask for advice.

Would you like make similar implementation instead of directly introduce clip and relu pattern?

wrongtest-intellif avatar Dec 12 '25 08:12 wrongtest-intellif

I completely agree. However, implementing a full, general Data Flow Reconnection logic requires significant development time and extensive testing.

Given that the current PR is functionally complete and stable, and it addresses two very common patterns (Clipping and ReLU), could we proceed with merging the current PR first? This would allow the community to gain the immediate performance benefit.

I commit to delivering the generalized implementation as the next dedicated PR. And, I am also happy to discuss the generalized logic with you before implementing the code.

kimm240 avatar Dec 12 '25 09:12 kimm240