LoRA icon indicating copy to clipboard operation
LoRA copied to clipboard

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Open aleemsidra opened this issue 2 years ago • 4 comments

Hi! I am trying to use LoRA for my convolution layers: self.conv = Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False). I used lora counterpart of nn. Conv2D aslora.Conv2d(n_chans_in, n, self.kernel_size, padding=self.padding, bias=False, r=2, lora_alpha= 2)

The shape of tensors is: x.shape = torch.Size([32, 1, 256, 256]), self.lora_B = torch.Size([48, 6]), self.lora_A.shape = torch.Size([6, 3]).

The part (self.lora_B @ self.lora_A).view(self.conv.weight.shape) faces the following issue:

/Documents/Domain_Apatation/UDAS/src/LoRA/loralib/layers.py:315, in forward(self, x)
    312 if self.r > 0 and not self.merged:
    313     return self.conv._conv_forward(
--> 314         x, 
    315 
    316         self.conv.weight + (self.lora_B @ self.lora_A).view(self.conv.weight.shape) * self.scaling,
    317         self.conv.bias
    318     )
    319 return self.conv(x)

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

The number of columns in self.lora_B is 6 and number of rowsself.lora_Ais 6, which makes matrix multiplication valid. But still I face this issue. Can you please help me resolve this bug.

aleemsidra avatar Jul 28 '23 13:07 aleemsidra

Can you try this operation on CPU to exclude GPU-related issues?

edwardjhu avatar Jul 28 '23 23:07 edwardjhu

@edwardjhu I did that as:

lora_b =self.lora_B.detach().cpu()
lora_b.shape
(48, 6)
lora_a =self.lora_A.detach().cpu()
lora_a.shape
(6, 3)

Given the dimensions lora_b @ lora_a are comaptible for matrix multiplication.

self.conv.weight.shape
torch.Size([16, 1, 3, 3])

Now I tested the following by replacing view with reshape, it worked

a = self.conv._conv_forward(x.detach().cpu(), self.conv.weight.detach().cpu() + (self.lora_B.detach().cpu() @ self.lora_A.detach().cpu()).reshape(self.conv.weight.detach().cpu().shape) * self.
    ...: scaling,  self.conv.bias )

I want to understand that why this thing didnot work on CUDA. since inputs are all the same? I would like to process my computation on GPU.

aleemsidra avatar Jul 31 '23 12:07 aleemsidra

Does reshape resolve the issue on GPU as well?

edwardjhu avatar Aug 05 '23 17:08 edwardjhu

no.

aleemsidra avatar Aug 09 '23 15:08 aleemsidra