Lei Zhang
Lei Zhang
Hi, everyone. The current triton does not have top-k function (https://pytorch.org/docs/stable/generated/torch.topk.html), is there any plan to implement it recently?
```toml [experimental] pipeline_parallel_degree = 2 pipeline_parallel_microbatches = 2 pipeline_parallel_split_points = ["layers.16"] ``` 
The currently fp8 kernel can only support standard matmul (A~[M, K], B~[K, N]), however, MoE usually implemented as a batched matmul (A~[B, M, K], [B, K, N]) where B is...
FP8 Linear does not work for me: > - torch == 2.4.0 + cu121 > - torchao == 0.4.0 > - cuda_arch == 8.9 (nvidia L40) ```python import torch import...
Hi there, I am the collaborator on [mle-agent](https://github.com/MLSysOps/MLE-agent), a pairing LLM agent for machine learning engineers and researchers. This awesome-ai-agents repo provides a very comprehensive agent list, and it would...
Closes https://github.com/MLSysOps/MLE-agent/issues/166 #### Before submitting this PR, please make sure you have: - [x] confirmed all checks still pass OR confirm CI build passes. - [x] verified that any code...
Closes https://github.com/MLSysOps/MLE-agent/issues/285