Complex number wrapper
NCCL does not support complex numbers directly and does not plan to (see issue). Are we willing to add a wrapper to NCCL.jl to make using complex numbers more convienient? Alternatively, the wrapper could be put in a higher level package (ex. Lux.jl, see issue). I am happy to start working on this but would like some feedback if possible. My primary motivation is using neural networks with complex valued weights and this feature would greatly simplify things.
I'm not a user of NCCL.jl myself, so cc @avik-pal @simonbyrne.
I am honestly okay with either.
I suggested opening this issue because having it here makes it easier for downstream libraries (other than Lux when/if they want to use NCCL).
But if we want to keep this NCCL.jl wrapper simple and provide functionality nccl natively provides, we can implement this in Lux.
Let's wait for @simonbyrne's opinion. Since he did most of the work getting this package back to life.
That seems fine. Note that you don't want to use reim/complex though, instead just take advantage of the fact that complex arrays are packed the same as real arrays, but with twice as many elements.
I think the easiest solution:
- Define
ncclDataType_t(::Type{Complex{T}}) where {T} = ncclDataType_t(T) - Instead of using
length(X)to deterine thecountargument, define a custom function that can take the datatype into account, e.g.count(X::CuArray{T}) where {T} = length(X) count(X::CuArray{Complex{T}}) where {T} = 2*length(X)
Maybe don't tie it to CuArray either, as NCCL should also work with unified memory.
Maybe don't tie it to
CuArrayeither, as NCCL should also work with unified memory.
@simonbyrne Does your above implementation not take care of unified memory automatically? My understanding of unified memory is that it's just a subtype of cuarray, i.e. something like CuArray{T, N, CUDA.UnifiedMemory}.
My understanding of unified memory is that it's just a subtype of cuarray, i.e. something like CuArray{T, N, CUDA.UnifiedMemory}
Unified memory can also be exposed as an Array (e.g. by doing unsafe_wrap), however the use case for that is mostly to be able to call into CPU functionality. For GPU-related uses, I would generally expect unified memory to be represented as a CuArray.