TensorRT-LLM
TensorRT-LLM copied to clipboard
How to reproduce 150 TPS using FP8 + MTP=0 + BSZ=1 on H200?
So glad to see a great update of TRT-LLM which largely improves H200x8 to 150 TPS for R1. But what I get locally is just 7 TPS. What's the correct command to enjoy 150 TPS?
@jiahanc Hi Cyrus, I think you are the right person to answer this question? :)
cc @NVGaryJi for vis also.
Hi @ghostplant , The 150 TPS is with MTP = 3. We have a PR to document the reproduction steps on both Hopper and Blackwell: https://github.com/NVIDIA/TensorRT-LLM/pull/3232