apuaaChen
apuaaChen
I got a similar issue with the transformer_base. The evaluation accuracy curve is a little bit weird. The highest accuracy reaches 0.3359 at step 2.5k, then it drops to <...
Hi! It can be supported by adding a new node. In order to get the row number, you can following the examples here https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/fusion/visitor_store.hpp#L749-L752 The coord at line 749 is...
Hi @tlrmchlsmth, thanks for the PR! One question I have is that can we use the `VisitorScalarBroadcast` to achieve the same target? It also takes a scalar (e.g. float) and...
@tlrmchlsmth Got it! Let me merge it. Thanks for the explanation.
@ProExpertProg Please push your changes to this branch. I will first merge your updates to our internal repo. After the CI is passed, I can get your PR merged, thanks!
> @apuaaChen were you able to get the PR run on the internal CI? Yes,It passed the internal CI. I’m combining it with a few other fixes right now
Hi @Hongbosherlock! Thank you for your patient. I have attached the revision that should work ```c++ torch::Tensor matmul_w4a8(const torch::Tensor &A, const torch::Tensor &B, const torch::Tensor &alphaCol, const torch::Tensor &alphaRow) {...
Hi @mlazos, I think that's expected for sm90. "C" and "D" are hardcoded in the epilogue, and "D" should always take the output of EVT. This design enables smem reuse...
Hi @mlazos, there is no restrictions on C as far as I remember. Btw, 4.1 release add the verification for D being the final result of the tree. Here is...
Hi! The issue should be solved by 4.1. You can find the unit tests here: https://github.com/NVIDIA/cutlass/blob/main/test/python/cutlass/evt/evt_compute_sm80_90.py#L121