MPRNet icon indicating copy to clipboard operation
MPRNet copied to clipboard

Mixed Precision Training

Open zsxzs opened this issue 3 years ago • 2 comments

Hi! Thank you for your wonderful work and great code! Have you tried mixed precision training? I use mixed precision when I train to deblurring model on gopro datasets. But the lost value will become NAN after dozens of epochs. Could you help me find out what the problem is? M9J878BY6F YZE XJ $O7U6

zsxzs avatar May 30 '22 05:05 zsxzs

This is modified code, unfortunately batchsize is 4 (There's not enough memory)

if opt.TRAINING.FP16:
    from torch.cuda.amp import GradScaler
    scaler = GradScaler()
else:
    scaler = None

for epoch in range(start_epoch, opt.OPTIM.NUM_EPOCHS + 1):
    epoch_start_time = time.time()
    total_loss = 0
    val_loss = 0
    train_id = 1
    model_restoration.train()
    for i, data in enumerate(tqdm(train_loader), 0):

        for param in model_restoration.parameters():
            param.grad = None

        with torch.no_grad():
            target = data[0].cuda()
            input_ = data[1].cuda()

        # scaler is not None
        if opt.TRAINING.FP16:
            from torch.cuda.amp import autocast

            with autocast():
                restored = model_restoration(input_)

                # Compute loss at each stage
                loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
                loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
                loss = (loss_char) + (0.05 * loss_edge)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

        else:
            restored = model_restoration(input_)

            # Compute loss at each stage
            loss_char = np.sum([criterion_char(restored[j], target) for j in range(len(restored))])
            loss_edge = np.sum([criterion_edge(restored[j], target) for j in range(len(restored))])
            loss = (loss_char) + (0.05 * loss_edge)

            loss.backward()
            optimizer.step()

        total_loss += loss.item()

zsxzs avatar May 30 '22 05:05 zsxzs

Hi! Thank you for your wonderful work and great code! Does the FP16 super parameter set a Boolean value and set it to true? I temporarily do not see this super parameter in the training. yml file.Thank you for your reply

drifterss avatar Mar 27 '23 08:03 drifterss