SRFlow icon indicating copy to clipboard operation
SRFlow copied to clipboard

Skipping ERROR caught in nll = model.optimize_parameters(current_step): svd_cuda: the updating process of SBDSDC did not converge (error: 23)

Open flybiubiu opened this issue 4 years ago • 3 comments

Thx author!I train x4 is ok! But when I train x8: Skipping ERROR caught in nll = model.optimize_parameters(current_step): svd_cuda: the updating process of SBDSDC did not converge (error: 23)

Python 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import torch print(torch.version) 1.7.1+cu110 print(torch.version.cuda) 11.0

print(torch.backends.cudnn.version()) 8005

············································································································ My GPU is 3090.I run setup code and find the cuda version is not compare.After that I reinstall with (pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2

About iters is 10000.

flybiubiu avatar Feb 19 '21 10:02 flybiubiu

I encountered the same problem as you. When this error occurs, subsequent data will have this error

RedRAINXXXX avatar Feb 20 '21 01:02 RedRAINXXXX

Hi, I encountered the same problem as you. Have you solved the problem? @flybiubiu, @RedRAINXXXX

JingzheLyp avatar Mar 17 '21 02:03 JingzheLyp

Hi, I encountered the same problem as you. Have you solved the problem? @flybiubiu, @RedRAINXXXX

Perhaps because the learning rate is too high, you can try warm up or lower the learning rate directly

RedRAINXXXX avatar Mar 17 '21 02:03 RedRAINXXXX