H20 infer Qwen14B fp8 results in a divide-by-zero error.
System Info Device: H20 Driver: 550.90.07
python env: nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 tensorrt 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.12.0.dev2024071600
model
Qwen14B: https://huggingface.co/Qwen/Qwen-14B
expected behavior
Expect to print performance and eval data normally.
actual behavior
pip install nvidia-cublas-cu12==12.3.4.1
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
This issue was closed because it has been stalled for 15 days with no activity.