mlmz comments

Results 19 comments of


                                            mlmz

Would llama3 wizardlm2 and other latest models be tested and published in leaderboard? 请求添加llama3 wizardlm等24年4-5月大模型的测试结果

@zhc7 superbench少了一个任务集，Digital Card Game

测试问题

你好，可以给出fastapi服务端一些日志吗，看报错是服务端没有响应

how to use fp8 for inference on h20?

![Image](https://github.com/user-attachments/assets/89dac707-c44f-4e38-a30a-7a277060659a)https://docs.sglang.ai/backend/server_arguments.html try it

how to use fp8 for inference on h20?

without this parameter，can it work？

how to use fp8 for inference on h20?

I see，this parameter works in the way you say. if the checkpoint is fp8, you should load it without specifying any arguments. ![Image](https://github.com/user-attachments/assets/656089ce-077e-467b-9d1f-05e19597fe4f)https://docs.sglang.ai/references/quantization.html#online-quantization

how to use fp8 for inference on h20?

how would you define "fully utilize"?

[Feature] Proposal for adding PD-Disaggregation Feature to SGLang

thanks for raising this issue, @ByronHsu is working on PD-Disaggregation, @ByronHsu could you take a look at this issue, thanks

[Bug] deepseekr1 illegal memory access on 28H20

We haven't found the root cause of your problem yet, you can try running some low concurrency (say 8 concurrent) tasks to warm up, and then increase the concurrency (say...

feat: add thinking_budget (version 2)

> ``` > 12 results - 4 files > > sglang • python/sglang/srt/model_executor/model_runner.py: > ... > > sglang • python/sglang/srt/openai_api/adapter.py: > 557 "min_new_tokens": request.min_tokens, > 558: "thinking_budget": request.thinking_budget, > 559...

[Bug]NCCL error if enable the cuda graph

Thank you for raising this issue, @ispobock @zhyncs could you help look at this issue, thanks