oneAPI.jl
oneAPI.jl copied to clipboard
Use DMA engine for large memory copies
We currently use a single global queue, but large memory transfers should probably use a special queue with FLAG_COPY set so that the DMA copy engines can be used. We'll probably need to order operations on that queue wrt. to the global queue (using events?).