MPRNet
MPRNet copied to clipboard
Mixed Precision Training
Hi! Thank you for your wonderful work and great code!
Have you tried mixed precision training? I use mixed precision when I train to deblurring model on gopro datasets. But the lost value will become NAN after dozens of epochs.
Could you help me find out what the problem is?

This is modified code, unfortunately batchsize is 4 (There's not enough memory)
if opt.TRAINING.FP16:
from torch.cuda.amp import GradScaler
scaler = GradScaler()
else:
scaler = None
for epoch in range(start_epoch, opt.OPTIM.NUM_EPOCHS + 1):
epoch_start_time = time.time()
total_loss = 0
val_loss = 0
train_id = 1
model_restoration.train()
for i, data in enumerate(tqdm(train_loader), 0):
for param in model_restoration.parameters():
param.grad = None
with torch.no_grad():
target = data[0].cuda()
input_ = data[1].cuda()
# scaler is not None
if opt.TRAINING.FP16:
from torch.cuda.amp import autocast
with autocast():
restored = model_restoration(input_)
# Compute loss at each stage
loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
loss = (loss_char) + (0.05 * loss_edge)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
else:
restored = model_restoration(input_)
# Compute loss at each stage
loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
loss = (loss_char) + (0.05 * loss_edge)
loss.backward()
optimizer.step()
total_loss += loss.item()
Hi! Thank you for your wonderful work and great code! Does the FP16 super parameter set a Boolean value and set it to true? I temporarily do not see this super parameter in the training. yml file.Thank you for your reply