TensorRT-LLM
TensorRT-LLM copied to clipboard
ammo_cuda_ext and ammo_cuda_ext_fp8 building failed
System Info
CPU: Intel(R) Xeon(R) Platinum 8369B, GPU: a single NVIDIA A10, Driver Version: 550.54.14, CUDA Version: 12.4, NVCC Version: 12.1.105, TensorRT-LLM Version: 0.9.0.dev2024022700, nvidia-ammo Version: 0.7.4
Who can help?
@Tracin
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
python ../quantization/quantize.py \
--model_dir ~/.cache/modelscope/hub/ZhipuAI/chatglm3-6b/ \
--dtype float16 \
--qformat int4_awq \
--output_dir trt_ckpt/chatglm3_6b/int4_awq/1-gpu
Expected behavior
ammo_cuda_ext and ammo_cuda_ext_fp8 built successfully and use GPU to do the quantization.
actual behavior
Loading extension ammo_cuda_ext...
[NeMo W 2024-03-24 23:19:07 nemo_logging:349] /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/utils/cpp_extension.py:57: UserWarning: Error building extension 'ammo_cuda_ext': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=ammo_cuda_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/TH -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/cac/miniconda3/envs/trtllm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/quantization/src/tensor_quant_gpu.cu -o tensor_quant_gpu.cuda.o
FAILED: tensor_quant_gpu.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=ammo_cuda_ext -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/TH -isystem /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/cac/miniconda3/envs/trtllm/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++17 -c /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/quantization/src/tensor_quant_gpu.cu -o tensor_quant_gpu.cuda.o
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/quantization/src/tensor_quant_gpu.cu:23: warning: "AT_DISPATCH_CASE_FLOATING_TYPES" redefined
23 | #define AT_DISPATCH_CASE_FLOATING_TYPES(...) \
|
In file included from /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/ATen/ATen.h:11,
from /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/quantization/src/tensor_quant_gpu.cu:13:
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/ATen/Dispatch.h:232: note: this is the location of the previous definition
232 | #define AT_DISPATCH_CASE_FLOATING_TYPES(...) \
|
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/quantization/src/tensor_quant_gpu.cu:23: warning: "AT_DISPATCH_CASE_FLOATING_TYPES" redefined
23 | #define AT_DISPATCH_CASE_FLOATING_TYPES(...) \
|
In file included from /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/ATen/ATen.h:11,
from /home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/ammo/torch/quantization/src/tensor_quant_gpu.cu:13:
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/ATen/Dispatch.h:232: note: this is the location of the previous definition
232 | #define AT_DISPATCH_CASE_FLOATING_TYPES(...) \
|
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/pybind11/cast.h:45:120: error: expected template-name before ‘<’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/pybind11/cast.h:45:120: error: expected identifier before ‘<’ token
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/pybind11/cast.h:45:123: error: expected primary-expression before ‘>’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
/home/cac/miniconda3/envs/trtllm/lib/python3.10/site-packages/torch/include/pybind11/cast.h:45:126: error: expected primary-expression before ‘)’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
ninja: build stopped: subcommand failed.
Unable to load extension ammo_cuda_ext and falling back to CPU version.
warnings.warn(f"{e}\nUnable to load extension {name} and falling back to CPU version.")
additional notes
Are there any conflicts of software versions?
Hi, did you resolve the issue?
Not working on it.