CLAM icon indicating copy to clipboard operation
CLAM copied to clipboard

CLAM main.py failes

Open eladzis opened this issue 2 years ago • 3 comments

Hello, I was trying to run your model and every thing went find until I ran main.py . This is the error: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)``

Which is caused because of the this line:

File "model_clam.py", line 151, in forward
    A, h = self.attention_net(h)  # NxK   

I looked at "h" : torch.Size([26780, 1024]) tensor([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [1.5065, 3.6833, 1.6349, ..., 0.4466, 0.9938, 0.3411], [2.1708, 2.9039, 2.8781, ..., 0.0000, 0.0000, 0.5808], [0.8070, 0.0293, 2.0469, ..., 0.0000, 0.0000, 0.5300]], device='cuda:0')

From these results I concluded that the error accord because something in the embedding process didn't work right and resulted in nan.

I would appreciate your help understanding what went wrong.

Thanks!

eladzis avatar Sep 04 '23 08:09 eladzis

I meet the same error and need help too.

ret = torch.addmm(bias, input, weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

yuanzhang7 avatar Dec 08 '23 06:12 yuanzhang7

I solved my error by updating the torch version and not use the one in the yaml file

eladzis avatar Dec 10 '23 06:12 eladzis

Could you please share your torch version and relevant settings?

yuanzhang7 avatar Dec 10 '23 08:12 yuanzhang7

hey, i just bumped torch and other libraries to recent versions. Hopefully that solves torch-related issues but please let me know if there are further issues.

fedshyvana avatar Apr 06 '24 23:04 fedshyvana