TensorRT-LLM H20 infer Qwen14B fp8 results in a divide-by-zero error.

System Info Device: H20 Driver: 550.90.07

python env: nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 tensorrt 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.12.0.dev2024071600

model Qwen14B: https://huggingface.co/Qwen/Qwen-14B expected behavior Expect to print performance and eval data normally. actual behavior 截屏2024-07-26 15 01 33 截屏2024-07-26 15 01 42

Jul 26 '24 07:07 menggeliu1205

pip install nvidia-cublas-cu12==12.3.4.1

Jul 29 '24 02:07 qiaoxj07

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Sep 06 '24 01:09 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

Sep 21 '24 01:09 github-actions[bot]