hip with CUDA backend fails when -default-stream=per-thread
Compiling HIP code with CUDA backend and the option -default-stream per-thread causes a number of functions to undefined as they are behind a guard in cuda_runtime_api.h that detects this option.
One of these is cudaStreamGetCaptureInfo_v2(), which is called from hipStreamGetCaptureInfo_v2() in hip/nvidia_detail/nvidia_hip_runtime_api.h
Compiling HIP code with this option enabled will therefore error with cudaStreamGetCaptureInfo_v2 is undefined.
Present hack is to delete the HIP function from the header, but this function needs to be guarded in the same way its CUDA cousin is guarded to allow for compilation with -default-stream per-thread to succeed.
Here's a patch if y'all AMD folks want to ensure this is fixed (?):
diff --git a/include/hip/nvidia_detail/nvidia_hip_runtime_api.h b/include/hip/nvidia_detail/nvidia_hip_runtime_api.h
index 0c492b7c..7dbfedd5 100644
--- a/include/hip/nvidia_detail/nvidia_hip_runtime_api.h
+++ b/include/hip/nvidia_detail/nvidia_hip_runtime_api.h
@@ -3465,7 +3473,7 @@ inline static hipError_t hipStreamGetCaptureInfo(hipStream_t stream,
return hipCUDAErrorTohipError(cudaStreamGetCaptureInfo(stream, pCaptureStatus, pId));
}
-#if CUDA_VERSION >= CUDA_11030
+#if CUDA_VERSION >= CUDA_11030 && defined(__CUDA_API_VERSION_INTERNAL)
inline static hipError_t hipStreamGetCaptureInfo_v2(
hipStream_t stream, hipStreamCaptureStatus* captureStatus_out,
unsigned long long* id_out __dparm(0), hipGraph_t* graph_out __dparm(0),
Note that guard used here matches the guard used in the CUDA header for whether this function prototype is declared or not.
Fix for this has been already raised internally. Will close once it lands into github develop branch.
@satyanveshd Please advise if this is fixed with latest ROCm 6.1.0 (HIP 6.1)? Thanks!
Yes. This is fixed with latest ROCm 6.1.