vision Replicate PyTorch transforms like transforms.ToTensor() and transforms.Normalize() in opencv c++

Issue description

In my development code, I have pytorch transforms on my test dataset (torchvision.transforms), I have exported the model into ONNX and I want to replicate these transforms with OpenCV c++. but, when i make predictions using pytorch (with the exported ONNX model), I get different result than when making predictions with OpenCV c++

PyTorch transforms


test_transforms = transforms.Compose([
                           transforms.Resize(pretrained_size),
                           transforms.CenterCrop(pretrained_size),
                           transforms.ToTensor(),
                           transforms.Normalize(mean = pretrained_means, 
                                                std = pretrained_stds)
                       ])

Output of Inference with PyTorch:

pytorch image type: torch.float32
detected class: cross-over
confidence: 0.8553193807601929
output probabilities of softmax layer: [[2.9558592e-05 8.5531938e-01 6.2924426e-04 1.1440608e-02 5.4786936e-04
  4.9833752e-02 2.7969838e-04 8.1919849e-02]]

My Inference with OpenCV c++

    py::dict VClassdict;

    // Convert input Image from Python Code to Numpy Array
    Mat frame = nparray_to_mat(img);

    // Color Conversion to RGB
    Mat Img;
    //cvtColor(frame, Img, COLOR_BGR2RGB);

    // Resize Image to ResNet Input size
    cv::resize(frame, Img, Size(ClassinpWidth, ClassinpWidth));

    Mat normImg;
    Img.convertTo(normImg, CV_32FC3, 1.f / 255);
    Scalar mean;
    Scalar std;
    cv::meanStdDev(normImg, mean, std);

    mean[0] *= 255.0;
    mean[1] *= 255.0;
    mean[2] *= 255.0;

    double scale_factor = 0.003921569;  // equivalent to 1/255
   // Scalar mean = Scalar(117.8865, 128.52, 137.0115); // each channel mean is multiplied by 255.0
    //Scalar std = Scalar(0.2275, 0.2110, 0.2140);
    bool swapRB = true;
    bool crop = false;

    Mat blob;
    cv::dnn::blobFromImage(Img, blob, scale_factor, Size(ClassinpWidth, ClassinpWidth), mean, swapRB, crop);

    if (std.val[0] != 0.0 && std.val[1] != 0.0 && std.val[2] != 0.0)
    {
        // Divide blob by std.
        divide(blob, std, blob);
    }

    VClassResNet_Net.setInput(blob);

    // predict
    Mat prob = VClassResNet_Net.forward();

    cout << "output probabilities of softmax layer: " << prob;
    cout << endl;

    // extract prediction with highest confidence
    Point classIdPoint;
    double confidence;
    minMaxLoc(prob.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
    int classId = classIdPoint.x;

    // Setup Dict returned to Python code for detections
    VClassdict["ObjectClass"] = VClassResNet_classes[classId].c_str(); // Detected Class
    VClassdict["Confidence"] = confidence; // Confedence Level

Output of OpenCV c++ Inference:

detected class: cross-over
confidence: 0.9028045535087585
output probabilities of softmax layer: [2.5416075e-05, 0.90280455, 0.0031091773, 0.0042484328, 0.00012638989, 0.05069441, 9.7391217e-05, 0.038894258]

I've followed the OpenCV documentation : https://docs.opencv.org/4.x/dd/d55/pytorch_cls_c_tutorial_dnn_conversion.html

System Info

PyTorch version: 1.10.2+cu113
OpenCV version: 4.5.1
Python version: 3.9.7
OS: Windows

Jun 20 '22 13:06 Ahmed-Fayed

@Ahmed-Fayed We don't offer support for opencv at TorchVision and as a result, I can't be 100% confident over what the differences are. But here are some thoughts:

Focus on checking the output of the processing instead of the output of the network to avoid other non-deterministic issues. That is checking CV2's output against the outcome of TorchVision to trace the differences.
Ensure you use the same configuration on the transforms such as interpolation
Make sure you are not in some corner-case (for example cropping a smaller image to a larger canvas. can depending on how you resize). different libraries handle these differently. If you want 100% the same result, I recommend JIT-scripting TorchVision's transforms and using them directly in the C++ code. This is the recommended way to do it.

Hope that helps.

Jun 21 '22 14:06 datumbox

Same issue on my end. I am using an ONNX model and the results vary slightly

May 24 '23 19:05 PaVaNTrIpAtHi