junam2 comments

Repositories
Issues
Comments

Results 1 comments of


                                            junam2

LLAMA 3.1 8B Quantization failed from BF16 to FP8

@nv-guomingz Hello. I have a same issue when quantize FP8 llama 3.1 70B model. ``` GPU: H100 * 2 Driver Version: 550.90.07 CUDA: 12.4 Image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 TensorRT-LLM version: 0.11.0 ```...