Mario Lezcano Casado comments

Results 254 comments of


                                            Mario Lezcano Casado

[Frontend] [BC breaking] Always follow C semantics on %

Will land in https://github.com/triton-lang/triton/pull/4955

Assertion error in `ScanOpToLLVM` when broadcasting result of cumsum

Does it work on top of the tree?

Assertion error in `ScanOpToLLVM` when broadcasting result of cumsum

can you post the repro given by running master? With that one I am getting ``` repro.mlir:5:44: error: expected '

Assertion error in `ScanOpToLLVM` when broadcasting result of cumsum

nvm this repros: ``` #blocked = #ttg.blocked #blocked1 = #ttg.blocked #mma = #ttg.nvidia_mma module attributes {"ttg.num-ctas" = 1 : i32, "ttg.num-warps" = 8 : i32, ttg.target = "cuda:86", "ttg.threads-per-warp" =...

Assertion error in `ScanOpToLLVM` when broadcasting result of cumsum

srcValues is the total number registers of a given thread

Assertion error in `ScanOpToLLVM` when broadcasting result of cumsum

yep, I refactored all that and I must have missed something along the way. To find where it was changed, you can use `git blame` on the relevant code. Generally,...

[NVIDIA] Add multi-CTA test coverage for `test_mxfp`

what is the state of this PR?

[NVIDIA] Add multi-CTA test coverage for `test_mxfp`

That is surprising. Can you find what is it that was breaking before and what fixed it?

[NVIDIA] Add multi-CTA test coverage for `test_mxfp`

but the previous code was failing on `cp.async` not being supported on multiple CTAs, was it not? How is your previous fix related to the new changes that landed? Is...

[RFC] Add local scatter/gather for Gluon

You could simply reuse the same function implementing the LLVM lowering for both ops.