examples
examples copied to clipboard
Feature Question: Switch from normal malloc to cudaMallocManaged for UserPTR in camera examples?
I am currently working on a camera pipeline on a Jetson AGX Xavier and use the OpenCV performance example as a guideline for interactions with my Allied Vision Cameras. I needed the Unified Memory zero-copy capability from the Jetson, so I played around with the Buffer implementations. I didn't run explicit benchmarks for this, but it seems that my application runtime went from 7ms to 2ms just by using the Userptr option with cudaMallocManaged instead of the MMAPed kernel memory option. This has the additional benefit that there is no copying for GPU transfers involved anymore. May this be something that you would be interested in adding?