DaggerGPU.jl issues

Allow stream reuse for serial dependencies

Currently we synchronize from the host for each kernel, which is unlikely to provide competitive performance vs. standard task-based GPU programming. We should consider some way to track the streams...

jpsamaroo

performance

Add optimized move for ROCm/AMDGPU

We have these for CUDA, but we should also have these for ROCm. The APIs are likely nearly identical, so this should be easy.

jpsamaroo

performance

rocm

Always use Adapt to convert values to/from GPU

Currently we only convert between plain, dense array types, but don't handle wrappers like `Adjoint`, which is super unfortunate. We should more reliably use Adapt for arbitrary objects so that...

jpsamaroo

bug

CUDA Device Memory Leak

3

I tried to create a small benchmark to see if `DaggerGPU.jl` can help me asynchronously move data to and from the GPU. However, it seems that the memory on the...

RomeoV

Add usage example.

Closes #28.

RomeoV

Usage example

2

Hello, would it be possible to add a usage example? I couldn't find one here, nor in the Dagger.jl docs. For example, let's say I have the following task: ```julia...

RomeoV

DaggerGPU.jl
DaggerGPU.jl copied to clipboard

Metadata

Allow stream reuse for serial dependencies

Add optimized move for ROCm/AMDGPU

Always use Adapt to convert values to/from GPU

CUDA Device Memory Leak

Add usage example.

Usage example

Update TagBot.yml

Update CompatHelper.yml to latest version

Merge DaggerGPU streams into task-local stream

← Metadata

Owner

Metadata

DaggerGPU.jl DaggerGPU.jl copied to clipboard

Metadata

← Metadata

Owner

Metadata

DaggerGPU.jl
DaggerGPU.jl copied to clipboard