onnxruntime
onnxruntime copied to clipboard
onnxruntime-gpu, cudaoptions, result is different
Describe the issue
When I use onnxruntime for inference, I found that the results of each run are different (to about 4 decimal places) after cuda options are used. Is this the accuracy problem caused by the copy from CPU to GPU? I used a for loop to verify this conjecture. I found that the position of the for loop would be like this before the session, but NOT after the session was created... Of course, if you turn off cuda and use cpu inference, the above problems will not occur in any case.
To reproduce
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "meter_recon");
Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(NUM_THREADS);
session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
#define USE_CUDA
#ifdef USE_CUDA
AF_INFO("USE CUDA, DEVICE_ID={:d}", DEVICE_ID);
// OrtCUDAProviderOptions cuda_options;
// cuda_options.device_id = DEVICE_ID;
// cuda_options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchExhaustive;
// cuda_options.gpu_mem_limit = std::numeric_limits<size_t>::max();
// cuda_options.arena_extend_stracudategy = 0;
// cuda_options.do_copy_in_default_stream = true;
// session_options.AppendExecutionProvider_CUDA(cuda_options);
OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, DEVICE_ID);
#endif
for(int jjjj = 0; jjjj < 10; ++jjjj){
Ort::Session session(env, model_path.c_str(), session_options);
auto height = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape()[2];
auto width = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape()[3];
// Load Image
cv::Mat in_mat = cv::imread(input_image.string());
...
}
Urgency
No response
Platform
Linux
OS Version
Docker: Ubuntu20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.11.1-gpu-linux
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU, CUDA
Execution Provider Library Version
CUDA 11.5, cuDNN8.3