Lei Zhang issues

Results 10 issues of


                                            Lei Zhang

implement top-K for triton

Hi, everyone. The current triton does not have top-k function (https://pytorch.org/docs/stable/generated/torch.topk.html), is there any plan to implement it recently?

[Bug] Loss=-1.0 and GPU memory keeps increasing in pipeline parallel

```toml [experimental] pipeline_parallel_degree = 2 pipeline_parallel_microbatches = 2 pipeline_parallel_split_points = ["layers.16"] ``` ![img_v3_02dt_3762c001-9e4b-4b29-b4dd-8a19f0e33c0g](https://github.com/user-attachments/assets/b648c67e-d87e-46ee-9a64-e67d491f2465)

[Question] Is there any plans to support fp8 batched matmul (`_scaled_bmm`)

The currently fp8 kernel can only support standard matmul (A~[M, K], B~[K, N]), however, MoE usually implemented as a batched matmul (A~[B, M, K], [B, K, N]) where B is...

float8

[BUG] Float8Linear does not work with torch.inference_mode

FP8 Linear does not work for me: > - torch == 2.4.0 + cu121 > - torchao == 0.4.0 > - cuda_arch == 8.9 (nvidia L40) ```python import torch import...

float8

add MLE-Agent to Open Source

Hi there, I am the collaborator on [mle-agent](https://github.com/MLSysOps/MLE-agent), a pairing LLM agent for machine learning engineers and researchers. This awesome-ai-agents repo provides a very comprehensive agent list, and it would...

[WIP] add gemini model

Closes https://github.com/MLSysOps/MLE-agent/issues/166 #### Before submitting this PR, please make sure you have: - [x] confirmed all checks still pass OR confirm CI build passes. - [x] verified that any code...

enhancement

size:L

size:M

[memory] CLI show memory files

mle memory --list "/path/to/code" --limit 10

enhancement

Lei Zhang