AMDGPUnative.jl icon indicating copy to clipboard operation
AMDGPUnative.jl copied to clipboard

Implement decent kernel exceptions

Open jpsamaroo opened this issue 5 years ago • 0 comments

Currently, Julia exceptions trigger an s_trap 2, which causes the wait handler to hang, and doesn't communicate any useful information about what caused the exception. This PR replaces that mechanism with a similar mechanism to the one CUDAnative uses, but with the goal to provide proper per-kernel exceptions (as opposed to throwing the exception at the next API call).

Todo:

  • [x] Pass some exception info through ring buffer
  • [x] Remove need for @rocprint calls to fix kernels with exceptions
  • [x] Test execution control intrinsics
  • [x] Test memcpy/memset intrinsics
  • [x] Implement and test free for malloc'd data
  • [ ] ~Actually make the ring buffer a ring buffer (currently it's a list)~
  • [ ] Protect against data races (atomic load/store the kernel ID in ExceptionEntry)
  • [x] De-duplicate exceptions per-kernel
  • [x] Don't overallocate the copy buffer for passing exceptions
  • [ ] Clean-up malloc'd data when the kernel exits by default
  • [ ] Document it all!
  • [ ] Test that multi-wavefront kernels error properly

jpsamaroo avatar Jun 01 '20 21:06 jpsamaroo