oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

gpu: conv: jit: add early slm check to skip ngen jitting

Open ZackyLake opened this issue 3 months ago • 1 comments

Description

Convolution's jit kernel could fail due to SLM usage exceeding limit: https://github.com/uxlfoundation/oneDNN/blob/05b4b09de057482a0b324cf8938476400418905b/src/gpu/intel/jit/codegen/codegen.cpp#L1752-L1758

The common usage of SLM in conv is for reduce of k slicing: https://github.com/uxlfoundation/oneDNN/blob/05b4b09de057482a0b324cf8938476400418905b/src/gpu/intel/conv/jit/ir_builder.cpp#L513-L517

And this size is known with the config, no need to spend time in IR generation.

This PR calculates this SLM usage and perform early check, should save some compilation time wasting on unsuitable config.

Code could be better by abstracting logic into functions rather than copy-pasting, but that introduce a bigger change...

Checklist

General

  • [x] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • [x] Have you formatted the code using clang-format?

Performance improvements

  • [ ] Have you submitted performance data that demonstrates performance improvements?

New features

  • [ ] Have you published an RFC for the new feature?
  • [ ] Was the RFC approved?
  • [ ] Have you added relevant tests?

Bug fixes

  • [ ] Have you included information on how to reproduce the issue (either in a github issue or in this PR)?
  • [ ] Have you added relevant regression tests?

RFC PR

  • [ ] Does RFC document follow the template?
  • [ ] Have you added a link to the rendered document?

ZackyLake avatar Oct 22 '25 04:10 ZackyLake

@ZackyLake Thanks for the PR! Do you observe the related overhead in real workloads?

I agree with your comment - we can fail much earlier in case of insufficient SLM. However, I suggest two adjustments:

  1. Extend is_slm_limit_ok() check to estimate the total SLM size (not just k-slicing related). Also it makes sense to move it out of tiler.cpp file (as it's not tiler-specific)
  2. Drop the check here and add assertions validating real SLM size based on the estimated size at step 1

If this is a general improvement - I suggest to focus on making the check more complete (see bullets 1 and 2 above). But if this is a quick fix to address some E2E overhead - then I'm fine to merge this as is with some minor adjustments (like rename is_slm_limit_ok() and move it out of tiler.cpp to emphasize that it's only k-slicing related check, etc).

echeresh avatar Oct 27 '25 18:10 echeresh