apuaaChen comments

Results 12 comments of


                                            apuaaChen

ListOps performance

I got a similar issue with the transformer_base. The evaluation accuracy curve is a little bit weird. The highest accuracy reaches 0.3359 at step 2.5k, then it drops to <...

[QST]question about cutlass epilogue customization

Hi! It can be supported by adding a new node. In order to get the row number, you can following the examples here https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp#L749-L752 The coord at line 749 is...

Allow scalar broadcasting in VisitorRowBroadcast and VisitorColBroadcast

Hi @tlrmchlsmth, thanks for the PR! One question I have is that can we use the `VisitorScalarBroadcast` to achieve the same target? It also takes a scalar (e.g. float) and...

Allow scalar broadcasting in VisitorRowBroadcast and VisitorColBroadcast

@tlrmchlsmth Got it! Let me merge it. Thanks for the explanation.

Allow scalar broadcasting in VisitorRowBroadcast and VisitorColBroadcast

@ProExpertProg Please push your changes to this branch. I will first merge your updates to our internal repo. After the CI is passed, I can get your PR merged, thanks!

Allow scalar broadcasting in VisitorRowBroadcast and VisitorColBroadcast

> @apuaaChen were you able to get the PR run on the internal CI? Yes，It passed the internal CI. I’m combining it with a few other fixes right now

[QST] GEMM Epilogue Fusion: Row-wise and Column-wise Multiplication

Hi @Hongbosherlock! Thank you for your patient. I have attached the revision that should work ```c++ torch::Tensor matmul_w4a8(const torch::Tensor &A, const torch::Tensor &B, const torch::Tensor &alphaCol, const torch::Tensor &alphaRow) {...

[BUG][Inductor-EVT] Python EVT tracer generates incorrect code when assigning accumulator to output D

Hi @mlazos, I think that's expected for sm90. "C" and "D" are hardcoded in the epilogue, and "D" should always take the output of EVT. This design enables smem reuse...

[BUG][Inductor-EVT] Python EVT tracer generates incorrect code when assigning accumulator to output D

Hi @mlazos, there is no restrictions on C as far as I remember. Btw, 4.1 release add the verification for D being the final result of the tree. Here is...

[FEA][Inductor-EVT] tanh, sigmoid, exp, gelu are not supported in python evt tracer

Hi! The issue should be solved by 4.1. You can find the unit tests here: https://github.com/NVIDIA/cutlass/blob/main/test/python/cutlass/evt/evt_compute_sm80_90.py#L121