DaggerGPU.jl
DaggerGPU.jl copied to clipboard
Allow stream reuse for serial dependencies
Currently we synchronize from the host for each kernel, which is unlikely to provide competitive performance vs. standard task-based GPU programming. We should consider some way to track the streams of previous tasks, and then reuse the stream for any future tasks which depend on that task.