MPRNet Mixed Precision Training

Hi! Thank you for your wonderful work and great code! Have you tried mixed precision training? I use mixed precision when I train to deblurring model on gopro datasets. But the lost value will become NAN after dozens of epochs. Could you help me find out what the problem is? M9J878BY6F YZE XJ $O7U6

May 30 '22 05:05 zsxzs

This is modified code, unfortunately batchsize is 4 (There's not enough memory)

if opt.TRAINING.FP16:
    from torch.cuda.amp import GradScaler
    scaler = GradScaler()
else:
    scaler = None

for epoch in range(start_epoch, opt.OPTIM.NUM_EPOCHS + 1):
    epoch_start_time = time.time()
    total_loss = 0
    val_loss = 0
    train_id = 1
    model_restoration.train()
    for i, data in enumerate(tqdm(train_loader), 0):

        for param in model_restoration.parameters():
            param.grad = None

        with torch.no_grad():
            target = data[0].cuda()
            input_ = data[1].cuda()

        # scaler is not None
        if opt.TRAINING.FP16:
            from torch.cuda.amp import autocast

            with autocast():
                restored = model_restoration(input_)

                # Compute loss at each stage
                loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
                loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
                loss = (loss_char) + (0.05 * loss_edge)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

        else:
            restored = model_restoration(input_)

            # Compute loss at each stage
            loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
            loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
            loss = (loss_char) + (0.05 * loss_edge)

            loss.backward()
            optimizer.step()

        total_loss += loss.item()

May 30 '22 05:05 zsxzs

Hi! Thank you for your wonderful work and great code! Does the FP16 super parameter set a Boolean value and set it to true? I temporarily do not see this super parameter in the training. yml file.Thank you for your reply

Mar 27 '23 08:03 drifterss