gdrcopy icon indicating copy to clipboard operation
gdrcopy copied to clipboard

Observing Host OS panic/crash due to use-after-free error related to CPU OS memory when `gdr_unmap` is not called before `gdr_close`

Open realarnavgoel opened this issue 1 year ago • 0 comments

Impacted platform All server side products, first observed on Grace-Hopper system

Impacted gdrcopy versions 2.0 and later

Impacted gdrcopy configs Both persistent and non-persistent mode

Scenarios If an application opens a connection to the driver (gdr_open), allocates a GPU memory via CUDA, pins and maps the allocated memory to CPU (gdr_pin_buffer, gdr_map) for read/write operations. Subsequently, if the application closes the connection, without explicitly unmapping the GPU memory, it results in a use-after-free (UAF) condition of OS memory, which can result in functional issues in unrelated areas, or even kernel panic or crash.

Known Mitigations applications should explicitly call gdr_unmap before gdr_close.

Fixed gdrcopy version 2.4.4

realarnavgoel avatar Jan 10 '25 02:01 realarnavgoel