cutile-python icon indicating copy to clipboard operation
cutile-python copied to clipboard

Questions about performance.

Open Jacki1223 opened this issue 1 month ago • 1 comments

Hi team! I found that for some benchmarks in /test, the performance of cutile python is much worse than that of Pytorch. Is this normal?

Jacki1223 avatar Dec 22 '25 03:12 Jacki1223

could you provide more specific benchmarks?

I tested cutile on a single RTX 5090, among gemm, flash attention, rmsnorm cutile gemm is about 15%-50% slower than cutlass in common cases. cutile flash attenion is as fast as the torch.sdpe. cutile rmsnorm is 20% faster than PyTorch.

ZhangZhiPku avatar Dec 25 '25 11:12 ZhangZhiPku

Hi @Jacki1223, please share more details about the specific benchmark results and the test environments you would like us to look into.

haijieg avatar Jan 08 '26 19:01 haijieg