Mario Lezcano Casado
Mario Lezcano Casado
Will land in https://github.com/triton-lang/triton/pull/4955
Does it work on top of the tree?
can you post the repro given by running master? With that one I am getting ``` repro.mlir:5:44: error: expected '
nvm this repros: ``` #blocked = #ttg.blocked #blocked1 = #ttg.blocked #mma = #ttg.nvidia_mma module attributes {"ttg.num-ctas" = 1 : i32, "ttg.num-warps" = 8 : i32, ttg.target = "cuda:86", "ttg.threads-per-warp" =...
srcValues is the total number registers of a given thread
yep, I refactored all that and I must have missed something along the way. To find where it was changed, you can use `git blame` on the relevant code. Generally,...
what is the state of this PR?
That is surprising. Can you find what is it that was breaking before and what fixed it?
but the previous code was failing on `cp.async` not being supported on multiple CTAs, was it not? How is your previous fix related to the new changes that landed? Is...
You could simply reuse the same function implementing the LLVM lowering for both ops.