TensorOperations.jl icon indicating copy to clipboard operation
TensorOperations.jl copied to clipboard

Update to cuTENSOR 2.0

Open github-actions[bot] opened this issue 2 years ago • 11 comments

This pull request changes the compat entry for the cuTENSOR package from 1 to 1, 2. This keeps the compat entries for earlier versions.

Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.

github-actions[bot] avatar Jan 19 '24 01:01 github-actions[bot]

@Jutho could you please push this through. Thanks

ejmeitz avatar Jan 31 '24 18:01 ejmeitz

I think @lkdvos checked and noticed that we cannot simply do this without having to also update our implementation/package extension, due to breaking changes.

Jutho avatar Jan 31 '24 22:01 Jutho

Sadly it's quite a bit of work since cuTENSOR has changed their interface. It's definitely somewhere on my to do list, but for now I think cuTENSOR v1 works just fine?

lkdvos avatar Feb 01 '24 10:02 lkdvos

I believe cuTensor restricts me to GPUArrays 9 which has a memory double free issue when using multiple threads. I was hoping to update to 10 but I believe this compat is restricting me.

If its a big change dont worry my code still runs all be it with a bunch of errors printing out.

ejmeitz avatar Feb 01 '24 12:02 ejmeitz

Apparently this issue can cause crashes. To be clear this only happens when using TensorOperations (and more specifically GPUArrays.jl) inside of multiple separate threads. In my case I have one thread per GPU.

image

ejmeitz avatar Feb 09 '24 22:02 ejmeitz

I started some work on moving to the new interface. I think it should be working for plain CuArrays, but I am still deciding on how to implement views/stridedviews, so for now that will have to wait. If you try it out, do let me know if there are any obvious errors?

lkdvos avatar Feb 10 '24 23:02 lkdvos

I also just noticed that cuTENSOR 2 requires julia 1.8, which I am not too happy about. I think this means we either need to keep two different versions of TensorOperations, for 1.6-1.7 with cuTENSOR 1 and for 1.8+ with cuTENSOR 2, or I would have to come up with a way of keeping the old code if julia is below 1.8. Maybe we can consider also restricting to julia 1.8, but I didn't see the need to do that here just yet

lkdvos avatar Feb 10 '24 23:02 lkdvos

I'll test it out, thanks for making some changes!

For the record here is the issue on GPUArrays: https://github.com/JuliaGPU/GPUArrays.jl/issues/503

ejmeitz avatar Feb 11 '24 02:02 ejmeitz

Awaiting the result of https://github.com/JuliaGPU/CUDA.jl/pull/2356 to simplify the implementation further.

lkdvos avatar Apr 29 '24 09:04 lkdvos