Bi-SRNet icon indicating copy to clipboard operation
Bi-SRNet copied to clipboard

使用BISRNRT作为Net训练模型时出现问题

Open Gchang9 opened this issue 3 years ago • 6 comments

您好,在使用SSCDl作为Net训练时,没有问题。在使用BiSRNet作为Net训练时,出现了问题,参数和数据及数据读取方式全部是按照您的代码做的。 出现的问题是: 训练的时候显示:WARNING:Nan or Inf found in Input tensor,且train_seg loss bn_loss均为nan

Gchang9 avatar May 22 '22 14:05 Gchang9

似乎关闭Tensorboard就不会出现这个问题,请问是为什么呢?

Gchang9 avatar May 22 '22 14:05 Gchang9

关闭Tensorboard之后不会报warning,但训练几个epoch之后还是会出现train_seg loss bn_loss均为nan,想请问您一下是什么情况呢?

Gchang9 avatar May 23 '22 05:05 Gchang9

您好,由于我把近3000组图像划分为了训练、测试、验证。导致训练的时候最后一个batch不能整除。目前我把drop_last设置为True可以正常训练了。但Tensorboard的暂时还未发现解决方案、

Gchang9 avatar May 23 '22 09:05 Gchang9

Hi. Is the batch size too small? The BiSRNet is a bit hard to train but the loss shouldn't be crazy. I got accuracy improvements while freezing the other model parts and training only the SR modules. You can also try that.

DingLei14 avatar May 23 '22 12:05 DingLei14

Thank you, the loss function is nan because my batchsize is not divisible, I set Droplast to True and can train normally! There is actually a minor problem. I noticed that both the 'Models' folder and the main folder have a file named bisrnet.py, which defines the head layer, and the former does not. What is the purpose of these two? Thank you again!

在2022年5月23日 @.***> 写道:

Hi. Is the batch size too small? The BiSRNet is a bit hard to train but the loss shouldn't be crazy. I got accuracy improvements while freezing the other model parts and training only the SR modules. You can also try that.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Gchang9 avatar May 23 '22 12:05 Gchang9