memory leak in a loop
Hi,
I meet an issue about the memory leak. here is my usage,
The code below is fine with the default dataloader in PyTorch, the memory usage is stable.
If I use the BackgroundGenerator to replace the PyTorch dataloader, the CPU memory would increase over time.
for epoch in range(0, epoches):
train0 = load(filelist[0])
for fidx in range(1, len(file_list) - 1):
if fidx < len(file_list) - 1:
# AsyncLoading is a custom threading class to load file in another threading.
backgroud = AsyncLoading(file_list[fidx], fidx)
backgroud.start()
if fidx % 2 == 1:
train_dataloader = data.DataLoader(dataset=Dataset(train0),
batch_size=48,
shuffle=True,
drop_last=True,
collate_fn=collate_fn)
else:
train_dataloader = data.DataLoader(dataset=Dataset(train1),
batch_size=48,
shuffle=True,
drop_last=True,
collate_fn=collate_fn)
####
# Do the training here.
####
if fidx < len(file_list) - 1:
backgroud.join()
if fidx % 2 == 0:
train0 = backgroud.get_result().copy()
else:
train1 = backgroud.get_result().copy()
### clean the dataset here and call gc.collect.
My new dataloader.
class DataLoaderX(data.DataLoader):
def __iter__(self):
return BackgroundGenerator(super().__iter__(), max_prefetch=2)
The memory usage comparison. The only difference is the dataloader.
Best Regards,
Hi, Thanks for reporting! I will try to look into this, but i cannot guarantee that i'll do that quickly, i'm a bit overloaded RN :(
Looking over the code, I think this is caused by the generator thread not joining after the iteration stops, leaking its resources.
A solution might be adding a handle to join/terminate the thread when GeneratorExit and/or StopIteration is raised