junam2
Results
1
comments of
junam2
@nv-guomingz Hello. I have a same issue when quantize FP8 llama 3.1 70B model. ``` GPU: H100 * 2 Driver Version: 550.90.07 CUDA: 12.4 Image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 TensorRT-LLM version: 0.11.0 ```...