Lei Zhang

Results 10 issues of Lei Zhang

Hi, everyone. The current triton does not have top-k function (https://pytorch.org/docs/stable/generated/torch.topk.html), is there any plan to implement it recently?

```toml [experimental] pipeline_parallel_degree = 2 pipeline_parallel_microbatches = 2 pipeline_parallel_split_points = ["layers.16"] ``` ![img_v3_02dt_3762c001-9e4b-4b29-b4dd-8a19f0e33c0g](https://github.com/user-attachments/assets/b648c67e-d87e-46ee-9a64-e67d491f2465)

The currently fp8 kernel can only support standard matmul (A~[M, K], B~[K, N]), however, MoE usually implemented as a batched matmul (A~[B, M, K], [B, K, N]) where B is...

float8

FP8 Linear does not work for me: > - torch == 2.4.0 + cu121 > - torchao == 0.4.0 > - cuda_arch == 8.9 (nvidia L40) ```python import torch import...

float8

Hi there, I am the collaborator on [mle-agent](https://github.com/MLSysOps/MLE-agent), a pairing LLM agent for machine learning engineers and researchers. This awesome-ai-agents repo provides a very comprehensive agent list, and it would...

Closes https://github.com/MLSysOps/MLE-agent/issues/166 #### Before submitting this PR, please make sure you have: - [x] confirmed all checks still pass OR confirm CI build passes. - [x] verified that any code...

enhancement
size:L

Closes https://github.com/MLSysOps/MLE-agent/issues/285

enhancement
size:M

mle memory --list "/path/to/code" --limit 10

enhancement