[no instance of function template] when installing TensorRT 8.6.1.6
Description
I am now installing TensorRT by following the Readme.md instructions. However, when running make, it fails with error below:
error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
I think this is caused by the object type error especially for T cub (due to the upgraded CUDA version), but not sure.
Environment
NVIDIA GeForce RTX 3090 TensorRT Version: 8.6.1.6
NVIDIA GPU: GTX 3090
NVIDIA Driver Version: 535.129.03
CUDA Version: 12.2
CUDNN Version: #define CUDNN_MAJOR 8 #define CUDNN_MINOR 9 #define CUDNN_PATCHLEVEL 7
Operating System:
Python Version (if applicable): 3.9.18
Tensorflow Version (if applicable): X
PyTorch Version (if applicable): 2.1.2
Baremetal or Container (if so, version):X
Relevant Files
I will attach the error page.
(corn) guest@XAI-3:~/Desktop/TensorRT/build$ make -j8
[ 1%] Built target third_party.protobuf
[ 1%] Built target caffe_proto
[ 2%] Built target gen_onnx_proto
[ 2%] Built target gen_onnx_data_proto
[ 2%] Built target gen_onnx_operators_proto
[ 6%] Built target nvcaffeparser
[ 8%] Built target onnx_proto
[ 13%] Built target nvcaffeparser_static
[ 17%] Built target nvonnxparser
[ 19%] Built target nvonnxparser_static
[ 19%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormKernel.cu.o
[ 19%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu.o
[ 19%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormKernel.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelMTron.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelMTron.cu.o
/home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu(98): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::embLayerNormKernelHFace<T,TPB>(int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 117
instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNormHFace(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 121
/home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu(228): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::embLayerNormKernel<T,TPB>(int, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 247
instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNorm(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 253
/usr/local/cuda-12.2/include/cub/warp/specializations/warp_reduce_shfl.cuh(360): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
output = reduction_op(input, temp);
^
detected during:
instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 388
instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<0>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 403
instantiation of "void cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(T &, ReductionOp, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<STEP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, STEP=0]" at line 449
instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceImpl(cub::CUB_200200_700_750_800_860_NS::Int2Type<1>, T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 530
instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::Reduce<ALL_LANES_VALID,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ALL_LANES_VALID=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 204 of /usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 257 of /home/guest/Desktop/TensorRT/plugin/common/common.cuh
instantiation of "void layerNorm<T,R,P,TPB>(const kvp<R> &, int32_t, int32_t, const P *, const P *, T *) [with T=float, R=float, P=float, TPB=256]" at line 233 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu
instantiation of "void nvinfer1::plugin::bert::embLayerNormKernel<T,TPB>(int, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 247 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu
instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNorm(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 253 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu
/usr/local/cuda-12.2/include/cub/warp/specializations/warp_reduce_shfl.cuh(360): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
output = reduction_op(input, temp);
^
detected during:
instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 388
instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<0>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 403
instantiation of "void cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(T &, ReductionOp, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<STEP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, STEP=0]" at line 449
instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceImpl(cub::CUB_200200_700_750_800_860_NS::Int2Type<1>, T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 530
instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::Reduce<ALL_LANES_VALID,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ALL_LANES_VALID=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 204 of /usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 257 of /home/guest/Desktop/TensorRT/plugin/common/common.cuh
instantiation of "void layerNorm<T,R,P,TPB>(const kvp<R> &, int32_t, int32_t, const P *, const P *, T *) [with T=float, R=float, P=float, TPB=256]" at line 103 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu
instantiation of "void nvinfer1::plugin::bert::embLayerNormKernelHFace<T,TPB>(int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 117 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu
instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNormHFace(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 121 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu
~~~~~~~~~~~~~~~~~~~~Removed due to maximum length constraints~~~~~~~~~~~~~~~~~~~~`
detected during:
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=7]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=6]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=5]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=4]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=3]" at line 121
[ 2 instantiation contexts not shown ]
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp>(ReductionOp, T, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 207
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 142 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
instantiation of "void nvinfer1::plugin::bert::skipln_vec<T,TPB,VPT,hasBias>(int32_t, const T *, const T *, T *, const T *, const T *, const T *) [with T=float, TPB=256, VPT=4, hasBias=true]" at line 275 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=true]" at line 281
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin.dir/build.make:3090: plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu.o] Error 1
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(185): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernelSmall<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=32U, hasBias=false]" at line 265
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297
/usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh(119): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
warp_aggregate = reduction_op(warp_aggregate, addend);
^
detected during:
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=6]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=5]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=4]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=3]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=2]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=1]" at line 156
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp>(ReductionOp, T, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 207
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 142 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
instantiation of "void nvinfer1::plugin::bert::skipln_vec<T,TPB,VPT,hasBias>(int32_t, const T *, const T *, T *, const T *, const T *, const T *) [with T=float, TPB=256, VPT=4, hasBias=true]" at line 275 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin_static.dir/build.make:3105: plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelMTron.cu.o] Error 1
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=false]" at line 281
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297
/usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh(119): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
warp_aggregate = reduction_op(warp_aggregate, addend);
^
detected during:
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=7]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=6]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=5]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=4]" at line 121
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=3]" at line 121
[ 2 instantiation contexts not shown ]
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp>(ReductionOp, T, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 207
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 142 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
instantiation of "void nvinfer1::plugin::bert::skipln_vec<T,TPB,VPT,hasBias>(int32_t, const T *, const T *, T *, const T *, const T *, const T *) [with T=float, TPB=256, VPT=4, hasBias=true]" at line 275 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=true]" at line 281
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(185): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernelSmall<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=32U, hasBias=false]" at line 265
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
object type is: cub::CUB_200200_700_750_800_860_NS::Sum
threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
^
detected during:
instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=false]" at line 281
instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297
9 errors detected in the compilation of "/home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu".
17 errors detected in the compilation of "/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu".
17 errors detected in the compilation of "/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu".
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin.dir/build.make:3075: plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormKernel.cu.o] Error 1
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin.dir/build.make:3165: plugin/CMakeFiles/nvinfer_plugin.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1216: plugin/CMakeFiles/nvinfer_plugin.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin_static.dir/build.make:3165: plugin/CMakeFiles/nvinfer_plugin_static.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1242: plugin/CMakeFiles/nvinfer_plugin_static.dir/all] Error 2
make: *** [Makefile:156: all] Error 2
Steps To Reproduce
I was following https://github.com/NVIDIA/TensorRT#building-tensorrt-oss -Example: Linux (x86-64) build with default cuda-12.1
Commands or scripts:
Have you tried the latest release?: I am now with the latest release
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): I think this is not relevant.
Thank you!
Looks like an env issue, I try to build with our tensorrt official container and the build suceess, maybe you can try it too.
Or reinstall CUDA, worth a try.
closing since no activity for more than 3 weeks, pls reopen if you still have question. Thanks!