[no instance of function template] when installing TensorRT 8.6.1.6

Open JisuHann opened this issue 2 years ago • 2 comments

Description

I am now installing TensorRT by following the Readme.md instructions. However, when running make, it fails with error below:

error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum

I think this is caused by the object type error especially for T cub (due to the upgraded CUDA version), but not sure.

Environment

NVIDIA GeForce RTX 3090 TensorRT Version: 8.6.1.6

NVIDIA GPU: GTX 3090

NVIDIA Driver Version: 535.129.03

CUDA Version: 12.2

CUDNN Version: #define CUDNN_MAJOR 8 #define CUDNN_MINOR 9 #define CUDNN_PATCHLEVEL 7

Operating System:

Python Version (if applicable): 3.9.18

Tensorflow Version (if applicable): X

PyTorch Version (if applicable): 2.1.2

Baremetal or Container (if so, version):X

Relevant Files

I will attach the error page.

(corn) guest@XAI-3:~/Desktop/TensorRT/build$ make -j8
[  1%] Built target third_party.protobuf
[  1%] Built target caffe_proto
[  2%] Built target gen_onnx_proto
[  2%] Built target gen_onnx_data_proto
[  2%] Built target gen_onnx_operators_proto
[  6%] Built target nvcaffeparser
[  8%] Built target onnx_proto
[ 13%] Built target nvcaffeparser_static
[ 17%] Built target nvonnxparser
[ 19%] Built target nvonnxparser_static
[ 19%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormKernel.cu.o
[ 19%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu.o
[ 19%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormKernel.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelMTron.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o
[ 20%] Building CUDA object plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelMTron.cu.o
/home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu(98): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
              threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                           ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::embLayerNormKernelHFace<T,TPB>(int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 117
            instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNormHFace(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 121

/home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu(228): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
              threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                           ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::embLayerNormKernel<T,TPB>(int, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 247
            instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNorm(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 253

/usr/local/cuda-12.2/include/cub/warp/specializations/warp_reduce_shfl.cuh(360): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
              output = reduction_op(input, temp);
                       ^
          detected during:
            instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 388
            instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<0>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 403
            instantiation of "void cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(T &, ReductionOp, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<STEP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, STEP=0]" at line 449
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceImpl(cub::CUB_200200_700_750_800_860_NS::Int2Type<1>, T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 530
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::Reduce<ALL_LANES_VALID,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ALL_LANES_VALID=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 204 of /usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 257 of /home/guest/Desktop/TensorRT/plugin/common/common.cuh
            instantiation of "void layerNorm<T,R,P,TPB>(const kvp<R> &, int32_t, int32_t, const P *, const P *, T *) [with T=float, R=float, P=float, TPB=256]" at line 233 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu
            instantiation of "void nvinfer1::plugin::bert::embLayerNormKernel<T,TPB>(int, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 247 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu
            instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNorm(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 253 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu

/usr/local/cuda-12.2/include/cub/warp/specializations/warp_reduce_shfl.cuh(360): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
              output = reduction_op(input, temp);
                       ^
          detected during:
            instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 388
            instantiation of "_T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(_T, ReductionOp, int, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<0>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, _T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 403
            instantiation of "void cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceStep(T &, ReductionOp, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<STEP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, STEP=0]" at line 449
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::ReduceImpl(cub::CUB_200200_700_750_800_860_NS::Int2Type<1>, T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 530
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::WarpReduceShfl<T, LOGICAL_WARP_THREADS, LEGACY_PTX_ARCH>::Reduce<ALL_LANES_VALID,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, LOGICAL_WARP_THREADS=32, LEGACY_PTX_ARCH=0, ALL_LANES_VALID=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 204 of /usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 257 of /home/guest/Desktop/TensorRT/plugin/common/common.cuh
            instantiation of "void layerNorm<T,R,P,TPB>(const kvp<R> &, int32_t, int32_t, const P *, const P *, T *) [with T=float, R=float, P=float, TPB=256]" at line 103 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu
            instantiation of "void nvinfer1::plugin::bert::embLayerNormKernelHFace<T,TPB>(int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 117 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu
            instantiation of "int32_t nvinfer1::plugin::bert::embSkipLayerNormHFace(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 121 of /home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu
 ~~~~~~~~~~~~~~~~~~~~Removed due to maximum length constraints~~~~~~~~~~~~~~~~~~~~`
          detected during:
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=7]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=6]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=5]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=4]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=3]" at line 121
            [ 2 instantiation contexts not shown ]
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp>(ReductionOp, T, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 207
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 142 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
            instantiation of "void nvinfer1::plugin::bert::skipln_vec<T,TPB,VPT,hasBias>(int32_t, const T *, const T *, T *, const T *, const T *, const T *) [with T=float, TPB=256, VPT=4, hasBias=true]" at line 275 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu

/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
          threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                       ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=true]" at line 281
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295

make[2]: *** [plugin/CMakeFiles/nvinfer_plugin.dir/build.make:3090: plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelHFace.cu.o] Error 1
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(185): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
          threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                       ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernelSmall<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=32U, hasBias=false]" at line 265
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297

/usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh(119): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
              warp_aggregate = reduction_op(warp_aggregate, addend);
                               ^
          detected during:
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=6]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=5]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=4]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=3]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=2]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=1]" at line 156
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp>(ReductionOp, T, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 207
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 142 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
            instantiation of "void nvinfer1::plugin::bert::skipln_vec<T,TPB,VPT,hasBias>(int32_t, const T *, const T *, T *, const T *, const T *, const T *) [with T=float, TPB=256, VPT=4, hasBias=true]" at line 275 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu

make[2]: *** [plugin/CMakeFiles/nvinfer_plugin_static.dir/build.make:3105: plugin/CMakeFiles/nvinfer_plugin_static.dir/embLayerNormPlugin/embLayerNormVarSeqlenKernelMTron.cu.o] Error 1
/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
          threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                       ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=false]" at line 281
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297

/usr/local/cuda-12.2/include/cub/block/specializations/block_reduce_warp_reductions.cuh(119): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
              warp_aggregate = reduction_op(warp_aggregate, addend);
                               ^
          detected during:
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=7]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=6]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=5]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=4]" at line 121
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp,SUCCESSOR_WARP>(ReductionOp, T, int, cub::CUB_200200_700_750_800_860_NS::Int2Type<SUCCESSOR_WARP>) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum, SUCCESSOR_WARP=3]" at line 121
            [ 2 instantiation contexts not shown ]
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::ApplyWarpAggregates<FULL_TILE,ReductionOp>(ReductionOp, T, int) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 207
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduceWarpReductions<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce<FULL_TILE,ReductionOp>(T, int, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, FULL_TILE=true, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 354 of /usr/local/cuda-12.2/include/cub/block/block_reduce.cuh
            instantiation of "T cub::CUB_200200_700_750_800_860_NS::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::Reduce(T, ReductionOp) [with T=cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, BLOCK_DIM_X=256, ALGORITHM=cub::CUB_200200_700_750_800_860_NS::BLOCK_REDUCE_WARP_REDUCTIONS, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0, ReductionOp=cub::CUB_200200_700_750_800_860_NS::Sum]" at line 142 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
            instantiation of "void nvinfer1::plugin::bert::skipln_vec<T,TPB,VPT,hasBias>(int32_t, const T *, const T *, T *, const T *, const T *, const T *) [with T=float, TPB=256, VPT=4, hasBias=true]" at line 275 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295 of /home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu

/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
          threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                       ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=true]" at line 281
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=true]" at line 295

/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(185): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
          threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                       ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernelSmall<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=32U, hasBias=false]" at line 265
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297

/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu(212): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>, cub::CUB_200200_700_750_800_860_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200200_700_750_800_860_NS::Sum
          threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                       ^
          detected during:
            instantiation of "void nvinfer1::plugin::bert::skipLayerNormKernel<T,TPB,hasBias>(int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, TPB=256U, hasBias=false]" at line 281
            instantiation of "int32_t nvinfer1::plugin::bert::computeSkipLayerNorm<T,hasBias>(cudaStream_t, int32_t, int32_t, const T *, const T *, const T *, const T *, T *, const T *) [with T=float, hasBias=false]" at line 297

9 errors detected in the compilation of "/home/guest/Desktop/TensorRT/plugin/embLayerNormPlugin/embLayerNormKernel.cu".
17 errors detected in the compilation of "/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu".
17 errors detected in the compilation of "/home/guest/Desktop/TensorRT/plugin/skipLayerNormPlugin/skipLayerNormKernel.cu".
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin.dir/build.make:3075: plugin/CMakeFiles/nvinfer_plugin.dir/embLayerNormPlugin/embLayerNormKernel.cu.o] Error 1
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin.dir/build.make:3165: plugin/CMakeFiles/nvinfer_plugin.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1216: plugin/CMakeFiles/nvinfer_plugin.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: *** [plugin/CMakeFiles/nvinfer_plugin_static.dir/build.make:3165: plugin/CMakeFiles/nvinfer_plugin_static.dir/skipLayerNormPlugin/skipLayerNormKernel.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1242: plugin/CMakeFiles/nvinfer_plugin_static.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

Steps To Reproduce

I was following https://github.com/NVIDIA/TensorRT#building-tensorrt-oss -Example: Linux (x86-64) build with default cuda-12.1

Commands or scripts:

Have you tried the latest release?: I am now with the latest release

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): I think this is not relevant.

Thank you!

Jan 23 '24 17:01 JisuHann

Looks like an env issue, I try to build with our tensorrt official container and the build suceess, maybe you can try it too.

Jan 24 '24 14:01 zerollzeng

Or reinstall CUDA, worth a try.

Jan 24 '24 14:01 zerollzeng

closing since no activity for more than 3 weeks, pls reopen if you still have question. Thanks!

Mar 26 '24 17:03 ttyio