TensorRT-LLM
TensorRT-LLM copied to clipboard
Error: FP8 quantize Integer divide-by-zero
System Info
- x86_64
- NVIDIA H20
- 96GB
- TensorRT-LLM version: 0.11.0.dev2024051400
Who can help?
@Tracin
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Use FP8 to quantize the Mixtral-8x22B-v0.1 model
# Quantize HF Mixtral into FP8 and export trtllm checkpoint
python3 ../examples/quantization/quantize.py --model_dir /workspace/TensorRT-LLM/build/Mixtral-8x22B-v0.1/ \
--dtype float16 \
--qformat fp8 \
--kv_cache_dtype fp8 \
--output_dir /workspace/TensorRT-LLM/build/Mixtral-8x22B-v0.1/tllm_checkpoint_8gpu_tp8_fp8 \
--calib_size 512 \
--tp_size 8 > Mixtral-8x22B-v0.1_convert_tp8_fp8.log 2>&1
Expected behavior
Without any errors.
actual behavior
[198181d05d5a:1195236] *** Process received signal ***
[198181d05d5a:1195236] Signal: Floating point exception (8)
[198181d05d5a:1195236] Signal code: Integer divide-by-zero (1)
[198181d05d5a:1195236] Failing at address: 0x7f0908cff921
[198181d05d5a:1195236] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f09a20c0520]
[198181d05d5a:1195236] [ 1] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(+0xaff921)[0x7f0908cff921]
[198181d05d5a:1195236] [ 2] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(+0x837ee3)[0x7f0908a37ee3]
[198181d05d5a:1195236] [ 3] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(+0x6b5132)[0x7f09088b5132]
[198181d05d5a:1195236] [ 4] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(+0x78d897)[0x7f090898d897]
[198181d05d5a:1195236] [ 5] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(+0x78e8c5)[0x7f090898e8c5]
[198181d05d5a:1195236] [ 6] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(+0x78f59e)[0x7f090898f59e]
[198181d05d5a:1195236] [ 7] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12(cublasLtTSTMatmulAlgoGetHeuristic+0x516)[0x7f09089def56]
[198181d05d5a:1195236] [ 8] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(+0x869eba)[0x7f092aa69eba]
[198181d05d5a:1195236] [ 9] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(+0x86a96c)[0x7f092aa6a96c]
[198181d05d5a:1195236] [10] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(+0x86cba2)[0x7f092aa6cba2]
[198181d05d5a:1195236] [11] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(+0x84358f)[0x7f092aa4358f]
[198181d05d5a:1195236] [12] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(+0xac7ecf)[0x7f092acc7ecf]
[198181d05d5a:1195236] [13] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(+0xac83d8)[0x7f092acc83d8]
[198181d05d5a:1195236] [14] /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12(cublasGemmEx+0x13d)[0x7f092a5e1c7d]
[198181d05d5a:1195236] [15] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0x333e71f)[0x7f095927871f]
[198181d05d5a:1195236] [16] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0x3367a06)[0x7f09592a1a06]
[198181d05d5a:1195236] [17] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(_ZN2at6native22structured_mm_out_cuda4implERKNS_6TensorES4_S4_+0x4a)[0x7f09592a1f5a]
[198181d05d5a:1195236] [18] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0x30f52d2)[0x7f095902f2d2]
[198181d05d5a:1195236] [19] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cuda.so(+0x30f5390)[0x7f095902f390]
[198181d05d5a:1195236] [20] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(_ZN2at4_ops2mm10redispatchEN3c1014DispatchKeySetERKNS_6TensorES6_+0x6e)[0x7f098ae18fbe]
[198181d05d5a:1195236] [21] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x45ca504)[0x7f098cb1d504]
[198181d05d5a:1195236] [22] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x45cb0e3)[0x7f098cb1e0e3]
[198181d05d5a:1195236] [23] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(_ZN2at4_ops2mm4callERKNS_6TensorES4_+0x15e)[0x7f098ae68c8e]
[198181d05d5a:1195236] [24] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x1cb55d0)[0x7f098a2085d0]
[198181d05d5a:1195236] [25] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(_ZN2at6native6matmulERKNS_6TensorES3_+0x49)[0x7f098a20fd99]
[198181d05d5a:1195236] [26] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x2ece850)[0x7f098b421850]
[198181d05d5a:1195236] [27] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(_ZN2at4_ops6matmul4callERKNS_6TensorES4_+0x15e)[0x7f098af94a2e]
[198181d05d5a:1195236] [28] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(_ZN2at6native6linearERKNS_6TensorES3_RKSt8optionalIS1_E+0x263)[0x7f098a1f72f3]
[198181d05d5a:1195236] [29] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x2ece5e3)[0x7f098b4215e3]
[198181d05d5a:1195236] *** End of error message ***
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
additional notes
None