RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Hi! I am trying to use LoRA for my convolution layers: self.conv = Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False). I used lora counterpart of nn. Conv2D aslora.Conv2d(n_chans_in, n, self.kernel_size, padding=self.padding, bias=False, r=2, lora_alpha= 2)
The shape of tensors is: x.shape = torch.Size([32, 1, 256, 256]), self.lora_B = torch.Size([48, 6]), self.lora_A.shape = torch.Size([6, 3]).
The part (self.lora_B @ self.lora_A).view(self.conv.weight.shape) faces the following issue:
/Documents/Domain_Apatation/UDAS/src/LoRA/loralib/layers.py:315, in forward(self, x)
312 if self.r > 0 and not self.merged:
313 return self.conv._conv_forward(
--> 314 x,
315
316 self.conv.weight + (self.lora_B @ self.lora_A).view(self.conv.weight.shape) * self.scaling,
317 self.conv.bias
318 )
319 return self.conv(x)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
The number of columns in self.lora_B is 6 and number of rowsself.lora_Ais 6, which makes matrix multiplication valid. But still I face this issue. Can you please help me resolve this bug.
Can you try this operation on CPU to exclude GPU-related issues?
@edwardjhu I did that as:
lora_b =self.lora_B.detach().cpu()
lora_b.shape
(48, 6)
lora_a =self.lora_A.detach().cpu()
lora_a.shape
(6, 3)
Given the dimensions lora_b @ lora_a are comaptible for matrix multiplication.
self.conv.weight.shape
torch.Size([16, 1, 3, 3])
Now I tested the following by replacing view with reshape, it worked
a = self.conv._conv_forward(x.detach().cpu(), self.conv.weight.detach().cpu() + (self.lora_B.detach().cpu() @ self.lora_A.detach().cpu()).reshape(self.conv.weight.detach().cpu().shape) * self.
...: scaling, self.conv.bias )
I want to understand that why this thing didnot work on CUDA. since inputs are all the same? I would like to process my computation on GPU.
Does reshape resolve the issue on GPU as well?
no.