cutile-python
cutile-python copied to clipboard
Questions about performance.
Hi team! I found that for some benchmarks in /test, the performance of cutile python is much worse than that of Pytorch. Is this normal?
could you provide more specific benchmarks?
I tested cutile on a single RTX 5090, among gemm, flash attention, rmsnorm cutile gemm is about 15%-50% slower than cutlass in common cases. cutile flash attenion is as fast as the torch.sdpe. cutile rmsnorm is 20% faster than PyTorch.
Hi @Jacki1223, please share more details about the specific benchmark results and the test environments you would like us to look into.