BlockArrays.jl icon indicating copy to clipboard operation
BlockArrays.jl copied to clipboard

Matrix multiplication error when using BlockArrays and CUDA

Open GVigne opened this issue 3 years ago • 15 comments

Hello, I'm trying to parallelize a code making use of BlockArrays. For this, I am using CUDA, and its array type CuArrays. However, I've noticed a strange behaviour when running the following script:

X = mortar((CUDA.randn(5,1),CUDA.randn(5,1))) Y = mortar((CUDA.randn(5,1),CUDA.randn(5,1))) X'*Y

image Here is an part of the backtrace: image

I have also encoutered the same kind of problem when using the mul! function from LinearAlgebra.

So far I haven't been able to come up with a smart workaround: I can always do CuArray(Array(mortar())) but then of course I loose all benefits of using CUDA because multiple data transfers must be made between the GPU and CPU.

On a side-note, the previous script also trigger the following warning in the REPL, and only in the REPL (doesn't appear when running a script with the julia myscript.jl command). I think it is because the default display method in the Julia REPL tries to access the array element-wise (not GPU compatible) and print every element. It's not really important (not a real error, and other packages have the same problem) but it was surprising at first to have an error when we're really doing nothing more than creating an array.

image

GVigne avatar Jun 16 '22 14:06 GVigne

I don't have access to CUDA.jl but I believe the issue is that GPUArray <: DenseArray, and ArrayLayouts.jl assumes that all DenseArray satisfy the strided array interface.

A simple fix is to not use DenseArray in ArrayLayouts.jl, that is, change

https://github.com/JuliaMatrices/ArrayLayouts.jl/blob/43b2ee20e60f9c8e525fab0851ca9ba6e46924ec/src/memorylayout.jl#L174

to be @inline MemoryLayout(::Type{<:Array}) = DenseColumnMajor().

Can you try making that change and let me know if it fixes the issue?

dlfivefifty avatar Jun 16 '22 22:06 dlfivefifty

Unfortunately, it doesn't seem to work, but for a different reason. When running the script in the REPL, the result is computed accordingly, but the warning "Performing scalar indexing" is issued. image

This is not just a REPL display bug, the computations aren't made on the GPU at all: the script cannot be executed when running it in the terminal. image

GVigne avatar Jun 17 '22 07:06 GVigne

I see. I think the solution is to rewrite

https://github.com/JuliaArrays/BlockArrays.jl/blob/32990fcdf401a18921bbf4b15dc14c599e466c47/src/blocklinalg.jl#L197

to call the 5-arg mul! (which didn't exist when I wrote the code). This would avoid piping through ArrayLayouts.jl. Can you try that?

dlfivefifty avatar Jun 17 '22 09:06 dlfivefifty

Ok, I have a different error now, but I still can't really understand it. The 5-arg mul! works correctly when using only CuArrays. image

At first I thought it came from the Array Y which holds the result, so I spent quite some time investigating: turns out this BlockArray contains matrices and not CuArrays, so I though it could be interfering. I tried to convert it to a BlockArray containing CuArrays, but I didn't manage to make it work either.

GVigne avatar Jun 17 '22 14:06 GVigne

You cut off the full error message and I can't see the types of X and Y

dlfivefifty avatar Jun 17 '22 18:06 dlfivefifty

Sorry, I wasn't clear. I am using the same example as above, that is X = mortar((CUDA.randn(5,1),CUDA.randn(5,1))) Y = mortar((CUDA.randn(5,1),CUDA.randn(5,1)))

GVigne avatar Jun 17 '22 18:06 GVigne

OK the issue is this line:

https://github.com/JuliaArrays/BlockArrays.jl/blob/32990fcdf401a18921bbf4b15dc14c599e466c47/src/blocklinalg.jl#L47

I’m not sure the best way to solve it. Perhaps if we have a function blocktype that returns the the type of the blocks we can instead call similar(BlockArray{T,N,promote_type(blocktype(M.A), blocktype(M.B))}, axes)

dlfivefifty avatar Jun 17 '22 20:06 dlfivefifty

Alright, I think I figured out the overall issue. Replacing muladd! by the 5-argument mul! isn't enough because the array holding the result is not Cuda compatible. Modifying similar at line 47 in blocklinalg.jl as you said still isn't enough: similar(BlockArray{T,N,promote_type(blocktype(M.A), blocktype(M.B))}, axes) calls to this piece of code: https://github.com/JuliaArrays/BlockArrays.jl/blob/32990fcdf401a18921bbf4b15dc14c599e466c47/src/blockarray.jl#L365-L366 This in turn calls: https://github.com/JuliaArrays/BlockArrays.jl/blob/32990fcdf401a18921bbf4b15dc14c599e466c47/src/blockarray.jl#L140-L141 The problem here is that we are creating an Array of Array, but I would like to create an Array of CuArays. Here is what I did to make sure this was really the problem:

  • I used the 5-argument mul!
  • I naively coded the function blocktype by assuming all blocks in M had the same type, so all this function had to do was return typeof(A.blocks[1])
  • I brutally modified https://github.com/JuliaArrays/BlockArrays.jl/blob/32990fcdf401a18921bbf4b15dc14c599e466c47/src/blockarray.jl#L140-L141 to
    @inline BlockArray{T}(::UndefInitializer, baxes::NTuple{N,AbstractUnitRange{Int}}) where {T, N} = initialized_blocks_BlockArray(Array{CuArray{T,N},N}, baxes) This of course, makes the function incompatible with something else than CuArrays: once again, it was just for testing purposes.

Once this was done, I was able to what I wanted to do initially: image

This being said, I don't really know how to solve this issue in a nice way. What would you recommand? I already talked with @antoine-levitt on this issue, maybe he also has some ideas?

GVigne avatar Jun 20 '22 14:06 GVigne

The easy option is to make a new package BlockCuArrays.jl that overloads the right operations. I'm not sure if there's a way of getting it to work otherwise...

dlfivefifty avatar Jun 20 '22 16:06 dlfivefifty