Dont use constant mask if ynumel potentially overflows ygrids
If (ynumel / YBLOCK) > get_max_ygrids(), the z dimension will be used if znumel is None. However, if (ynumel / YBLOCK) % get_max_ygrids() != 0, there will be program launches with inputs that require masking, and so this needs to be considered when determining if the y dimension has a constant mask.
Fixes #ISSUE_NUMBER
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/139751
- :page_facing_up: Preview Python docs built from this PR
- :page_facing_up: Preview C++ docs built from this PR
- :question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: You can merge normally! (3 Unrelated Failures)
As of commit 26e745b724d130e5e2fbf26d452a34158de3c918 with merge base 5deca07c0dcf1482eba99bf93b805cf1cc41ad6c ():
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
-
inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh) (similar failure)
##[error]Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers -
inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2) (gh) (similar failure)
##[error]Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
-
trunk / win-vs2019-cpu-py3 / test (default, 1, 3, lf.windows.4xlarge.nonephemeral, unstable) (gh)
[ FAILED ] TestQTensor.FromBlobQuantizedPerTensor
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@pytorchbot label 'topic: not user facing'
This should work but I'll let @eellison to comment if there can be better ways to handle constant ymask.
@pytorchbot merge
Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.
@eellison Can you approve the workflows? Thanks
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test
Details for Dev Infra team
Raised by workflow job
@eellison Looks like an unrelated failure
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test
Details for Dev Infra team
Raised by workflow job
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 3 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud
@pytorchbot rebase
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here
Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch push -f https://github.com/graphcore/pytorch-fork.git pull/139751/head:mwizak/fix-constant-mask-large-triton-grids returned non-zero exit code 128
remote: Permission to graphcore/pytorch-fork.git denied to pytorchmergebot.
fatal: unable to access 'https://github.com/graphcore/pytorch-fork.git/': The requested URL returned error: 403
This is likely because the author did not allow edits from maintainers on the PR or because the repo has additional permissions settings that mergebot does not qualify. Raised by https://github.com/pytorch/pytorch/actions/runs/12055871747
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 3 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here