vision icon indicating copy to clipboard operation
vision copied to clipboard

Replicate PyTorch transforms like transforms.ToTensor() and transforms.Normalize() in opencv c++

Open Ahmed-Fayed opened this issue 3 years ago • 1 comments

Issue description

In my development code, I have pytorch transforms on my test dataset (torchvision.transforms), I have exported the model into ONNX and I want to replicate these transforms with OpenCV c++. but, when i make predictions using pytorch (with the exported ONNX model), I get different result than when making predictions with OpenCV c++

PyTorch transforms


test_transforms = transforms.Compose([
                           transforms.Resize(pretrained_size),
                           transforms.CenterCrop(pretrained_size),
                           transforms.ToTensor(),
                           transforms.Normalize(mean = pretrained_means, 
                                                std = pretrained_stds)
                       ])

  • Output of Inference with PyTorch:
pytorch image type: torch.float32
detected class: cross-over
confidence: 0.8553193807601929
output probabilities of softmax layer: [[2.9558592e-05 8.5531938e-01 6.2924426e-04 1.1440608e-02 5.4786936e-04
  4.9833752e-02 2.7969838e-04 8.1919849e-02]]

My Inference with OpenCV c++

    py::dict VClassdict;

    // Convert input Image from Python Code to Numpy Array
    Mat frame = nparray_to_mat(img);

    // Color Conversion to RGB
    Mat Img;
    //cvtColor(frame, Img, COLOR_BGR2RGB);

    // Resize Image to ResNet Input size
    cv::resize(frame, Img, Size(ClassinpWidth, ClassinpWidth));

    Mat normImg;
    Img.convertTo(normImg, CV_32FC3, 1.f / 255);
    Scalar mean;
    Scalar std;
    cv::meanStdDev(normImg, mean, std);

    mean[0] *= 255.0;
    mean[1] *= 255.0;
    mean[2] *= 255.0;

    double scale_factor = 0.003921569;  // equivalent to 1/255
   // Scalar mean = Scalar(117.8865, 128.52, 137.0115); // each channel mean is multiplied by 255.0
    //Scalar std = Scalar(0.2275, 0.2110, 0.2140);
    bool swapRB = true;
    bool crop = false;

    Mat blob;
    cv::dnn::blobFromImage(Img, blob, scale_factor, Size(ClassinpWidth, ClassinpWidth), mean, swapRB, crop);

    if (std.val[0] != 0.0 && std.val[1] != 0.0 && std.val[2] != 0.0)
    {
        // Divide blob by std.
        divide(blob, std, blob);
    }

    VClassResNet_Net.setInput(blob);

    // predict
    Mat prob = VClassResNet_Net.forward();

    cout << "output probabilities of softmax layer: " << prob;
    cout << endl;

    // extract prediction with highest confidence
    Point classIdPoint;
    double confidence;
    minMaxLoc(prob.reshape(1, 1), 0, &confidence, 0, &classIdPoint);
    int classId = classIdPoint.x;

    // Setup Dict returned to Python code for detections
    VClassdict["ObjectClass"] = VClassResNet_classes[classId].c_str(); // Detected Class
    VClassdict["Confidence"] = confidence; // Confedence Level
  • Output of OpenCV c++ Inference:
detected class: cross-over
confidence: 0.9028045535087585
output probabilities of softmax layer: [2.5416075e-05, 0.90280455, 0.0031091773, 0.0042484328, 0.00012638989, 0.05069441, 9.7391217e-05, 0.038894258]

I've followed the OpenCV documentation : https://docs.opencv.org/4.x/dd/d55/pytorch_cls_c_tutorial_dnn_conversion.html

System Info

  • PyTorch version: 1.10.2+cu113
  • OpenCV version: 4.5.1
  • Python version: 3.9.7
  • OS: Windows

Ahmed-Fayed avatar Jun 20 '22 13:06 Ahmed-Fayed

@Ahmed-Fayed We don't offer support for opencv at TorchVision and as a result, I can't be 100% confident over what the differences are. But here are some thoughts:

  1. Focus on checking the output of the processing instead of the output of the network to avoid other non-deterministic issues. That is checking CV2's output against the outcome of TorchVision to trace the differences.
  2. Ensure you use the same configuration on the transforms such as interpolation
  3. Make sure you are not in some corner-case (for example cropping a smaller image to a larger canvas. can depending on how you resize). different libraries handle these differently. If you want 100% the same result, I recommend JIT-scripting TorchVision's transforms and using them directly in the C++ code. This is the recommended way to do it.

Hope that helps.

datumbox avatar Jun 21 '22 14:06 datumbox

Same issue on my end. I am using an ONNX model and the results vary slightly

PaVaNTrIpAtHi avatar May 24 '23 19:05 PaVaNTrIpAtHi