Replicate PyTorch transforms like transforms.ToTensor() and transforms.Normalize() in opencv c++
Issue description
In my development code, I have pytorch transforms on my test dataset (torchvision.transforms), I have exported the model into ONNX and I want to replicate these transforms with OpenCV c++. but, when i make predictions using pytorch (with the exported ONNX model), I get different result than when making predictions with OpenCV c++
PyTorch transforms
test_transforms = transforms.Compose([
transforms.Resize(pretrained_size),
transforms.CenterCrop(pretrained_size),
transforms.ToTensor(),
transforms.Normalize(mean = pretrained_means,
std = pretrained_stds)
])
- Output of Inference with PyTorch:
pytorch image type: torch.float32
detected class: cross-over
confidence: 0.8553193807601929
output probabilities of softmax layer: [[2.9558592e-05 8.5531938e-01 6.2924426e-04 1.1440608e-02 5.4786936e-04
4.9833752e-02 2.7969838e-04 8.1919849e-02]]
My Inference with OpenCV c++
py::dict VClassdict;
// Convert input Image from Python Code to Numpy Array
Mat frame = nparray_to_mat(img);
// Color Conversion to RGB
Mat Img;
//cvtColor(frame, Img, COLOR_BGR2RGB);
// Resize Image to ResNet Input size
cv::resize(frame, Img, Size(ClassinpWidth, ClassinpWidth));
Mat normImg;
Img.convertTo(normImg, CV_32FC3, 1.f / 255);
Scalar mean;
Scalar std;
cv::meanStdDev(normImg, mean, std);
mean[0] *= 255.0;
mean[1] *= 255.0;
mean[2] *= 255.0;
double scale_factor = 0.003921569; // equivalent to 1/255
// Scalar mean = Scalar(117.8865, 128.52, 137.0115); // each channel mean is multiplied by 255.0
//Scalar std = Scalar(0.2275, 0.2110, 0.2140);
bool swapRB = true;
bool crop = false;
Mat blob;
cv::dnn::blobFromImage(Img, blob, scale_factor, Size(ClassinpWidth, ClassinpWidth), mean, swapRB, crop);
if (std.val[0] != 0.0 && std.val[1] != 0.0 && std.val[2] != 0.0)
{
// Divide blob by std.
divide(blob, std, blob);
}
VClassResNet_Net.setInput(blob);
// predict
Mat prob = VClassResNet_Net.forward();
cout << "output probabilities of softmax layer: " << prob;
cout << endl;
// extract prediction with highest confidence
Point classIdPoint;
double confidence;
minMaxLoc(prob.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
int classId = classIdPoint.x;
// Setup Dict returned to Python code for detections
VClassdict["ObjectClass"] = VClassResNet_classes[classId].c_str(); // Detected Class
VClassdict["Confidence"] = confidence; // Confedence Level
- Output of OpenCV c++ Inference:
detected class: cross-over
confidence: 0.9028045535087585
output probabilities of softmax layer: [2.5416075e-05, 0.90280455, 0.0031091773, 0.0042484328, 0.00012638989, 0.05069441, 9.7391217e-05, 0.038894258]
I've followed the OpenCV documentation : https://docs.opencv.org/4.x/dd/d55/pytorch_cls_c_tutorial_dnn_conversion.html
System Info
- PyTorch version: 1.10.2+cu113
- OpenCV version: 4.5.1
- Python version: 3.9.7
- OS: Windows
@Ahmed-Fayed We don't offer support for opencv at TorchVision and as a result, I can't be 100% confident over what the differences are. But here are some thoughts:
- Focus on checking the output of the processing instead of the output of the network to avoid other non-deterministic issues. That is checking CV2's output against the outcome of TorchVision to trace the differences.
- Ensure you use the same configuration on the transforms such as interpolation
- Make sure you are not in some corner-case (for example cropping a smaller image to a larger canvas. can depending on how you resize). different libraries handle these differently. If you want 100% the same result, I recommend JIT-scripting TorchVision's transforms and using them directly in the C++ code. This is the recommended way to do it.
Hope that helps.
Same issue on my end. I am using an ONNX model and the results vary slightly