FastDeploy icon indicating copy to clipboard operation
FastDeploy copied to clipboard

ExternalError: CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED.

Open sixsixQAQ opened this issue 2 years ago • 9 comments

环境:debian 11, gcc 10.2, cuda 12.0, cudnn 8.8

运行结果:

[INFO] fastdeploy/runtime/runtime.cc(264)::CreatePaddleBackend  Runtime initialized with Backend::PDINFER in Device::GPU.
before
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
  what():  

  Compile Traceback (most recent call last):
    File "tools/export.py", line 116, in <module>
      main(args)
    File "tools/export.py", line 94, in main
      paddle.jit.save(net, save_path)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 631, in wrapper
      func(layer, path, input_spec, **configs)
    File "<decorator-gen-106>", line 2, in save
      
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
      return func(*args, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 861, in save
      inner_input_spec, with_hook=with_hook)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 528, in concrete_program_specify_input_spec
      *desired_input_spec, with_hook=with_hook)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 436, in get_concrete_program
      concrete_program, partial_program_layer = self._program_cache[cache_key]
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 801, in __getitem__
      self._caches[item_id] = self._build_once(item)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 790, in _build_once
      **cache_key.kwargs)
    File "<decorator-gen-104>", line 2, in from_func_spec
      
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
      return func(*args, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 733, in from_func_spec
      outputs = static_func(*inputs)
    File "/ssd1/home/chenguowei01/github/PaddleSeg/Matting/tools/../ppmatting/models/ppmattingv2.py", line 152, in forward
      paddle.shape(feats_backbone[-1])[-2:])  # 32x
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/github/PaddleSeg/Matting/tools/../ppmatting/models/layers/tensor_fusion.py", line 105, in forward
      atten = F.sigmoid(self.conv_atten(atten))
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/container.py", line 98, in forward
      input = layer(input)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/github/PaddleSeg/paddleseg/models/layers/layer_libs.py", line 109, in forward
      x = self._conv(x)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 678, in forward
      use_cudnn=self._use_cudnn)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/nn/functional/conv.py", line 169, in _conv_nd
      type=op_type, inputs=inputs, outputs=outputs, attrs=attrs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 44, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3621, in append_op
      attrs=kwargs.get("attrs", None))
    File "/ssd1/home/chenguowei01/anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2635, in __init__
      for frame in traceback.extract_stack():

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::AnalysisPredictor::ZeroCopyRun()
1   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, phi::Place const&)
2   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&) const
3   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&, paddle::framework::RuntimeContext*) const
4   void phi::KernelImpl<void (*)(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, paddle::optional<phi::DenseTensor> const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::string const&, std::vector<int, std::allocator<int> > const&, int, std::string const&, std::string const&, bool, std::vector<int, std::allocator<int> > const&, int, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >), &(void phi::fusion::ConvFusionKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, paddle::optional<phi::DenseTensor> const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::string const&, std::vector<int, std::allocator<int> > const&, int, std::string const&, std::string const&, bool, std::vector<int, std::allocator<int> > const&, int, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >))>::KernelCallHelper<paddle::optional<phi::DenseTensor> const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::string const&, std::vector<int, std::allocator<int> > const&, int, std::string const&, std::string const&, bool, std::vector<int, std::allocator<int> > const&, int, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >, phi::TypeTag<int> >::Compute<1, 3, 0, 0, phi::GPUContext const, phi::DenseTensor const, phi::DenseTensor const, phi::DenseTensor const>(phi::KernelContext*, phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&)
5   void phi::fusion::ConvFusionKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, paddle::optional<phi::DenseTensor> const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::string const&, std::vector<int, std::allocator<int> > const&, int, std::string const&, std::string const&, bool, std::vector<int, std::allocator<int> > const&, int, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >)
6   phi::DnnWorkspaceHandle::RunFunc(std::function<void (void*)> const&, unsigned long)
7   std::_Function_handler<void (void*), phi::fusion::ConvFusionKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, paddle::optional<phi::DenseTensor> const&, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::string const&, std::vector<int, std::allocator<int> > const&, int, std::string const&, std::string const&, bool, std::vector<int, std::allocator<int> > const&, int, phi::DenseTensor*, std::vector<phi::DenseTensor*, std::allocator<phi::DenseTensor*> >)::{lambda(void*)#4}>::_M_invoke(std::_Any_data const&, void*&&)
8   phi::enforce::EnforceNotMet::EnforceNotMet(phi::ErrorSummary const&, char const*, int)
9   phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
ExternalError: CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED. 
  [Hint: Please search for the error code(9) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at /build/Paddle/paddle/phi/kernels/fusion/gpu/conv_fusion_kernel.cu:612)
  [operator < conv2d_fusion > error]
已放弃

出错代码

cv::Mat
GpuInfer (const std::string &model_dir, const cv::Mat &image, const std::string &background_file)
{
	auto model_file = model_dir + sep + "model.pdmodel";
	auto params_file = model_dir + sep + "model.pdiparams";
	auto config_file = model_dir + sep + "deploy.yaml";
	
	auto option = fastdeploy::RuntimeOption();
	option.UseGpu();
	
	option.UsePaddleInferBackend();
	auto model = fastdeploy::vision::matting::PPMatting (model_file, params_file,
	             config_file, option);
	cv::Mat vis_im;
	if (!model.Initialized()) {
		std::cerr << "Failed to initialize." << std::endl;
		return vis_im;
	}
	auto im = image;
	fastdeploy::vision::MattingResult res;
	cerr << "before" << endl;

	if (!model.Predict (&im, &res)) {
		std::cerr << "Failed to predict." << std::endl;
		return vis_im;
	}
	cerr << "after" << endl;
......
}

NVIDIA 官网对CUDNN_STATUS_NOT_SUPPORTED的解释:The functionality requested is not presently supported by cuDNN.

sixsixQAQ avatar Feb 14 '23 15:02 sixsixQAQ

还没有支持CUDA 12, 重新安装cuda 11.2~11.6试下

jiangjiajun avatar Feb 15 '23 02:02 jiangjiajun

@jiangjiajun @sixsixQAQ @edwardzhou @ZeyuChen @zh794390558 重装到11.7还是不行,各位大佬是怎么解决的

EveningLin avatar Mar 22 '23 13:03 EveningLin

奇怪的是部署分割完全没问题,分类网络就不行

guoyunqingyue avatar Jun 08 '23 06:06 guoyunqingyue

用的2.7的代码,同样的环境,跑训练没问题,用训练后的模型进行推理也没问题,但转换成推理模型后再预测就出这个CUDNN错误,如之奈何?

intothephone avatar Oct 10 '23 08:10 intothephone

应该是cuda版本太高的问题,11.2和10.2上是可以的。

shiyutang avatar Dec 18 '23 02:12 shiyutang

CUDA11.6跑OCR应该是没问题,我打包了一个镜像应用可以直接调用OCR,见:https://qq742971636.blog.csdn.net/article/details/135109278

xxddccaa avatar Dec 21 '23 07:12 xxddccaa

请问解决了吗,解决了踢我一下[email protected]

WangShengFeng1 avatar Mar 10 '24 12:03 WangShengFeng1

请问解决了吗,解决了踢我一下[email protected]

应该是cudnn版本太高导致的,我今天也遇到这个问题,降低cudnn版本到8.4之后报错消失,但是又出现新报错(External) CUDA error(700), an illegal memory access was encountered。找到群里有个好大哥给的他的配置,cuda版本11.8,cudnn版本8.5。我正准备试一下,感觉可行

zhangjin2233 avatar Apr 02 '24 10:04 zhangjin2233