libhsakmt: get correct memory info for IPC shared memory
add reigstered shared memory info. otherwise it fails to deattach the IPC shared memory.
Hi Junwei, can you provide more context about who fails to detach IPC memory? Is that an interaction with the ROCr runtime? I don't think there is a dependency inside libhsakmt itself on fmm_get_mem_info for releasing IPC memory.
I'll ask a ROCr engineer to take a look at this pull request as well.
Hi Junwei, can you provide more context about who fails to detach IPC memory? Is that an interaction with the ROCr runtime? I don't think there is a dependency inside libhsakmt itself on fmm_get_mem_info for releasing IPC memory.
I'll ask a ROCr engineer to take a look at this pull request as well.
Hi Felix,
Thanks for your reply. Yes, after I do IPC import, deattach the memory, it has an error in the path, since it misses the case of registered shared memory. The error comes from the returned NULL ptr, HSA IPCDetach() is going to unmap a null ptr. Anyway, the memory will be freed when hsa shuts down. But better to free it in correct way, I think.
Regards, Jerry
Hi Junwei,
Please add a Signed-off-by line to your commit. Then I will merge it into our internal branch after internal code review and testing.
Hi Junwei,
Please add a Signed-off-by line to your commit.
Done.
Then I will merge it into our internal branch after internal code review and testing.
Thanks
Regards
I don't think this is quite right. It will often appear to function but there will be subtle bugs that could lead to data corruption. There are several patches to ROCr's IPC handling currently in internal code review that should correct the fault.
We may need something similar if hsakmt is to support CPU access to imported memory.
Sorry for my late response.
After import the IPC memory, and use it, we will do hsa_amd_ipc_memory_detach() to release the memory.
during the process, from below code, we know that ptr is got from PtrInfo(), then pass to hsaKmtUnmapMemoryToGPU() for unmapping.
while in fmm_get_mem_info() called by PtrInfo(), it doesn't handle SHARED ptr(aka IPC memory).
So the ptr is 0, the memory is not unmapped actually for now.
- https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/runtime.cpp#L1000
PtrInfoBlockData block;
hsa_amd_pointer_info_t info;
info.size = sizeof(info);
if (PtrInfo(ptr, &info, nullptr, nullptr, nullptr, &block) != HSA_STATUS_SUCCESS)
return HSA_STATUS_ERROR_INVALID_ARGUMENT;
ptr = block.base;
}
}
if (hsaKmtUnmapMemoryToGPU(ptr) != HSAKMT_STATUS_SUCCESS)
return HSA_STATUS_ERROR_INVALID_ARGUMENT;
Regards, Junwei zhang
Yes you are correct about the defect. However, the proposed patch leads to other difficulties with regard to both the thunk interface definitions and the handling of small memory allocations.
There is a patch series that fixes this issue but unfortunately just missed the 3.5 branching date. It will be in the 3.6 release. I'll see if we can post it early.
For the issue, it could be fixed in hsa, but in the view of thunk, I think it misses a kind of memory type handling. IMHO, for a lib or sw block, it's better to handle any kind of request from the user(up caller), it cannot assume the actions of callers.
Regards, Junwei Zhang
Sorry for the lateness closing this off, but it's in ROCm 3.6+