Memory Replay increased GPU memory consumption after the first experience
Hi,
I am not sure if this is expected behavior when I run experiment with Memory Replay plugin only, but after the first experience the GPU memory usage is increased (in my case, quite significantly) as per nvidia-smi.
training on first experience (experience 0) memory usage: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 00000000:00:1E.0 Off | 0* | | N/A 38C P0 139W / 150W | 3984MiB / 7618MiB | 94% Default | +-------------------------------+----------------------+----------------------+
training on second experience (experience 1) memory usage: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.129 Driver Version: 410.129 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 00000000:00:1E.0 Off | 0* | | N/A 52C P0 135W / 150W | 7284MiB / 7618MiB | 100% Default | +-------------------------------+----------------------+----------------------+
This is for the setting with:
replay = ReplayPlugin(mem_size=250)
ResNet-50 network,
SGD optimizer,
Cross entropy loss,
batch size 32,
256 by 256 pixels input image size,
2 experiences (experience 0 and experience 1)
The training strategy and loop is as in Avalanche examples:
cl_strategy = SupervisedTemplate(
net, optimizer, criterion,
plugins=[replay],device=device,train_mb_size=batch_size,train_epochs=1,eval_mb_size=batch_size,evaluator=eval_plugin)
for i, experience in enumerate(generic_scenario.train_stream,0):
print(i)
print("Start of experience: ", experience.current_experience)
cl_strategy.train(experience) # train on ith experience
print('Training completed')
According to my understanding, memory replay methods based on sampling from the previous subset (of a size defined by mem_size) and adding them into current training batches, which shouldn't increase the occupied memory wrt the first experience.
So is this GPU memory usage increase the expected behavior when using MemoryReplay plugin?
Thanks, Woj
Do you have a script to reproduce this problem? what happens in the following steps? Is the memory growing continuously?
The code is attached below.
There are 2 experiences in the data stream scenario (generic_scenario = dataset_benchmark([trainset1, trainset2], [testset1, testset2])) and
- during first experience the GPU memory usage is ~4GB (see as indicated in my above comment : training on first experience (experience 0) memory usage:)
- and then during second experience the GPU memory usage is ~7GB (see as indicated in my above comment: training on second experience (experience 1) memory usage) So while running the code below I just check this GPU usage manually by running nvidia-smi
import torch
from torch.utils.data import TensorDataset
import torch.nn as nn
import torch.optim as optim
from torchvision import models
from avalanche.training.templates import SupervisedTemplate
from avalanche.training.plugins import ReplayPlugin
from avalanche.evaluation.metrics import forgetting_metrics, accuracy_metrics,\
loss_metrics, confusion_matrix_metrics
from avalanche.models import SimpleMLP
import numpy as np
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin
from avalanche.benchmarks.generators import dataset_benchmark
N1 = 500
torch.manual_seed(0)
np.random.seed(0)
torch.backends.cudnn.deterministic = True
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# train sets - experience 0 and 1
x_data1 = torch.rand(N1,3,320,320)+2
y_data1 = torch.ones(N1).long()
x_data1_neg = torch.rand(N1,3,320,320)
y_data1_neg = torch.zeros(N1).long()
x_data1 = torch.cat((x_data1,x_data1_neg))
y_data1 = torch.cat((y_data1,y_data1_neg))
x_data2 = torch.rand(N1,3,320,320)+4
y_data2 = torch.ones(N1).long()
x_data2_neg = torch.rand(N1,3,320,320)+0.5
y_data2_neg = torch.zeros(N1).long()
x_data2 = torch.cat((x_data2,x_data2_neg))
y_data2 = torch.cat((y_data2,y_data2_neg))
trainset1 = TensorDataset(x_data1, y_data1)
trainset2 = TensorDataset(x_data2, y_data2)
# test sets - experience 0 and 1
x_data1 = torch.rand(N1,3,320,320)+2
y_data1 = torch.ones(N1).long()
x_data1_neg = torch.rand(N1,3,320,320)
y_data1_neg = torch.zeros(N1).long()
x_data1 = torch.cat((x_data1,x_data1_neg))
y_data1 = torch.cat((y_data1,y_data1_neg))
x_data2 = torch.rand(N1,3,320,320)+4
y_data2 = torch.ones(N1).long()
x_data2_neg = torch.rand(N1,3,320,320)+0.5
y_data2_neg = torch.zeros(N1).long()
x_data2 = torch.cat((x_data2,x_data2_neg))
y_data2 = torch.cat((y_data2,y_data2_neg))
testset1 = TensorDataset(x_data1, y_data1)
testset2 = TensorDataset(x_data2, y_data2)
generic_scenario = dataset_benchmark([trainset1, trainset2],
[testset1, testset2])
eval_plugin = EvaluationPlugin(
accuracy_metrics(minibatch=True, epoch=True, experience=True, stream=True),
loss_metrics(minibatch=True, epoch=True, experience=True, stream=True),
forgetting_metrics(experience=True, stream=True),
confusion_matrix_metrics(num_classes=2, save_image=False, stream=True),
loggers=[InteractiveLogger()],
strict_checks=False
)
class ResNet_2C(nn.Module):
def __init__(self, model):
super(ResNet_2C, self).__init__()
self.feature_extractor = nn.Sequential(*list(model.children())[0:8])
self.classification_layer = nn.Sequential(
nn.Linear(2048,2),
)
self.avg_pool = nn.AdaptiveAvgPool2d(1)
def forward(self, x):
x = self.feature_extractor(x)
x = self.avg_pool(x)
x = x.view(-1, 2048)
x = self.classification_layer(x)
return x
batch_size=16
model = models.resnet50(pretrained=True)
net = ResNet_2C(model).to(device)
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
criterion = nn.CrossEntropyLoss()
replay = ReplayPlugin(mem_size=250)
cl_strategy = SupervisedTemplate(
net, optimizer, criterion,
plugins=[replay],device=device,train_mb_size=batch_size,train_epochs=1,eval_mb_size=batch_size,evaluator=eval_plugin)
for i, experience in enumerate(generic_scenario.train_stream,0):
print(i)
print("Start of experience: ", experience.current_experience)
cl_strategy.train(experience) # train on ith experience
print('Training completed')
Hi @matkowski-voy !
The batch size for the replay dataloader is equal to 2 x batch_size by default because it concatenate two batches of the same size, one from the main dataloader load and one from the memory; this could be the reason for the increase in the memory usage. You can change the default setting with the following parameters when initializing the replay plugin:
ReplayPlugin(mem_size=50, batch_size=bsize//2, batch_size_mem=bsize//2)
where bsize is the original batch size.
Can you give it a try and see if it solves the issue?
Hi @HamedHemati ,
yes, that solves the issue.
Thanks a lot! :)