DaggerGPU.jl
DaggerGPU.jl copied to clipboard
Merge DaggerGPU streams into task-local stream
Within Datadeps, we try to synchronize the stream of incoming arrays (which should probably be the same as the current task-local stream), but we don't necessarily synchronize back to the task-local stream when returning from Datadeps. This is necessary to ensure correctness when mixing Datadeps and non-Datadeps operations.