opencv-rust split_1_def does not seem to work as expected

Hi everyone,

Thanks for this great project!

I'm having an issue when using cudaarithm::split_1_def to convert an image from hwc to chw. A C++ example of how to do this can be found here.

To make it simple, the idea is to use split to extract each channel, one after the other, in a continuous array of GPU memory. I have tested that this works successfully in C++ on my system.

My rust code to do this is as follows:

pub unsafe fn hwc_to_chw(image_in: &GpuMat, image_out: &mut GpuMat) -> Result<(), Error> {
    let width = image_in.cols() * image_in.rows();
    let mut channels: opencv::core::Vector<GpuMat> = opencv::core::Vector::with_capacity(3);
    let memaddr = image_out.cuda_ptr().unwrap();

    channels.push(GpuMat::new_rows_cols_with_data_def(image_in.rows(), image_in.cols(), CV_8U, memaddr).unwrap());
    channels.push(GpuMat::new_rows_cols_with_data_def(image_in.rows(), image_in.cols(), CV_8U, memaddr.byte_add(width as usize)).unwrap());
    channels.push(GpuMat::new_rows_cols_with_data_def(image_in.rows(), image_in.cols(), CV_8U, memaddr.byte_add((2 * width) as usize)).unwrap());

    opencv::cudaarithm::split_1_def(image_in, &mut channels)?;

    Ok(())
}

Where image_in is 640x640; CV_8UC3 and image_out is 1x409600 CV_8UC3 (this is to make sure the memory is continuous).

When I dump image_out to a txt file (after downloading it), I get nothing but zeros. However, following the same logic but with Mat instead of GpuMat (and therefore uploading and downloading memory when necessary) is successful. I am also sure my input image is correct (and uploaded to the GPU).

I have tested a couple of other GPU functions that are working as expected.

Any insight would be appreciated.

Thanks, Antoine

My operating system: Manjaro (ArchLinux based) Installed opencv-cuda from the extra registry. OpenCV version: 4.10.0-9 Opencv Rust version: 0.92.3 Rustc version: 1.78.0 Build logs are attached. build_logs.txt

Oct 14 '24 02:10 antoinefaure

I’ll check it out, but in the meanwhile can you please share the corresponding working C++ code too?

Oct 14 '24 05:10 twistedfall

Thanks @twistedfall .

Sure, here is the C++ code:

void hwc_to_chw(cv::cuda::GpuMat &frame, cv::cuda::GpuMat &chw){
    size_t width = frame.cols * frame.rows;

    std::vector<cv::cuda::GpuMat> input_channels{
        cv::cuda::GpuMat(frame.rows, frame.cols, CV_8U, &(chw.ptr()[0])), // R
        cv::cuda::GpuMat(frame.rows, frame.cols, CV_8U, &(chw.ptr()[width])), // G
        cv::cuda::GpuMat(frame.rows, frame.cols, CV_8U, &(chw.ptr()[width*2])) // B
    };

    cv::cuda::split(frame, input_channels); 
}

And here is my Rust function to do the same thing but on the CPU, which is working:

pub unsafe fn hwc_to_chw_cpu(image_in: &GpuMat, image_out: &mut GpuMat) -> Result<(), Error> {
    let width = (image_in.cols() * image_in.rows());
    let mut image_in_cpu = Mat::default();
    image_in.download(&mut image_in_cpu).unwrap();
    let image_out_cpu = Mat::new_rows_cols(1, width,  CV_8UC3).unwrap();

    let mut channels: opencv::core::Vector<Mat> = opencv::core::Vector::with_capacity(3);
    let memaddr = image_out_cpu.ptr(0).unwrap();

    channels.push(Mat::new_rows_cols_with_data_unsafe_def(image_in.rows(), image_in.cols(), CV_8U, memaddr as *mut c_void).unwrap());
    channels.push(Mat::new_rows_cols_with_data_unsafe_def(image_in.rows(), image_in.cols(), CV_8U, memaddr.byte_add(width as usize) as *mut c_void).unwrap());
    channels.push(Mat::new_rows_cols_with_data_unsafe_def(image_in.rows(), image_in.cols(), CV_8U, memaddr.byte_add((2 * width) as usize) as *mut c_void).unwrap());

    opencv::core::split(&image_in_cpu, &mut channels).unwrap();

    image_out.upload(&image_out_cpu).unwrap();

    Ok(())
}

Oct 14 '24 06:10 antoinefaure

Hi @twistedfall, have you had a chance to give this a try ?

Oct 21 '24 00:10 antoinefaure

Not yet, unfortunately, I'll try to find time this week

Oct 21 '24 07:10 twistedfall

I'll do more testing, but it seems that updating this create to the 0.93.3 release fixed the issue

Nov 06 '24 00:11 antoinefaure

I ran those functions on a CUDA enabled laptop and I can confirm that for me both functions (hwc_to_chw and hwc_to_chw_cpu) produce identical image_out with the latest crate version. Judging by your last message it works for you too, right?

Nov 11 '24 10:11 twistedfall

Yes now that I have updated the crate it does work, thank you for checking! Will close this issue.

Nov 11 '24 19:11 antoinefaure