prefetch_generator icon indicating copy to clipboard operation
prefetch_generator copied to clipboard

memory leak in a loop

Open jdxyw opened this issue 4 years ago • 3 comments

Hi,

I meet an issue about the memory leak. here is my usage,

The code below is fine with the default dataloader in PyTorch, the memory usage is stable.
If I use the BackgroundGenerator to replace the PyTorch dataloader, the CPU memory would increase over time.


for epoch in range(0, epoches):
     train0 = load(filelist[0])
     for fidx in range(1, len(file_list) - 1):
        if fidx < len(file_list) - 1:
            # AsyncLoading is a custom threading class to load file in another threading.
            backgroud = AsyncLoading(file_list[fidx], fidx)
            backgroud.start()

        if fidx % 2 == 1:
            train_dataloader = data.DataLoader(dataset=Dataset(train0),
                                               batch_size=48,
                                               shuffle=True,
                                               drop_last=True,
                                               collate_fn=collate_fn)
        else:
            train_dataloader = data.DataLoader(dataset=Dataset(train1),
                                               batch_size=48,
                                               shuffle=True,
                                               drop_last=True,
                                               collate_fn=collate_fn)
       ####
       # Do the training here.
       ####
        
       if fidx < len(file_list) - 1:
            backgroud.join()
            if fidx % 2 == 0:
                train0 = backgroud.get_result().copy()
            else:
                train1 = backgroud.get_result().copy()
       ### clean the dataset here and call gc.collect.
 

My new dataloader.

class DataLoaderX(data.DataLoader):
    def __iter__(self):
        return BackgroundGenerator(super().__iter__(), max_prefetch=2)

The memory usage comparison. The only difference is the dataloader.

截屏2021-04-04 下午3 22 27 截屏2021-04-04 下午3 22 44

Best Regards,

jdxyw avatar Apr 04 '21 07:04 jdxyw

Hi, Thanks for reporting! I will try to look into this, but i cannot guarantee that i'll do that quickly, i'm a bit overloaded RN :(

justheuristic avatar Apr 04 '21 14:04 justheuristic

Looking over the code, I think this is caused by the generator thread not joining after the iteration stops, leaking its resources.

Luxter77 avatar Dec 15 '23 08:12 Luxter77

A solution might be adding a handle to join/terminate the thread when GeneratorExit and/or StopIteration is raised

Luxter77 avatar Dec 15 '23 08:12 Luxter77