TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200?

Open ghostplant opened this issue 10 months ago • 1 comments

So glad to see a great update of TRT-LLM which largely improves H200x8 to 150 TPS for R1. But what I get locally is just 7 TPS. What's the correct command to enjoy 150 TPS?

ghostplant avatar Mar 26 '25 16:03 ghostplant

@jiahanc Hi Cyrus, I think you are the right person to answer this question? :)

cc @NVGaryJi for vis also.

juney-nvidia avatar Mar 26 '25 16:03 juney-nvidia

Hi @ghostplant , The 150 TPS is with MTP = 3. We have a PR to document the reproduction steps on both Hopper and Blackwell: https://github.com/NVIDIA/TensorRT-LLM/pull/3232

jiahanc avatar Apr 02 '25 16:04 jiahanc