oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

common: verbose: asynchronous verbose mode for execution time tracking

Open avmanerikar opened this issue 9 months ago • 0 comments

Description

This PR proposes a PoC for introducing an asynchronous verbose mode to accurately track kernel execution times in a non-blocking manner with minimal synchronization latencies. For the verbose mode, retrieving the kernel timing causes significant overhead as it requires the GPU kernel execution to be synchronized and also because it is tracked on the host. The asynchronous mode removes the synchronization overhead by using event callbacks to query execution timings. The prototype is created for a OpenCL GPU API that provides the kernel execution statistics for profiling.

The implementation will be added as an experimental functionality enabled during build-time with DNNL_EXPERIMENTAL_ASYNC_VERBOSE:

cmake .. -DDNNL_EXPERIMENTAL=ON -DDNNL_EXPERIMENTAL_ASYNC_VERBOSE=ON -DDNNL_EXPERIMENTAL_PROFILING=ON -DDNNL_GPU_RUNTIME=OCL 

Related RFC: [link]

Addresses MFDNN-13603.

Checklist

  • [x] Have you published an RFC for the new feature?
  • [ ] Was the RFC approved?
  • [ ] Have you added relevant tests?

avmanerikar avatar Apr 09 '25 17:04 avmanerikar