PyAV icon indicating copy to clipboard operation
PyAV copied to clipboard

`Resampler` object still occupies memory after deletion

Open MahmoudAshraf97 opened this issue 1 year ago • 5 comments

Overview

I have a class that accepts audio inputs and resamples them as needed, the resampler parameters are different for each call so I can't create the resampler as a class member, anyways, after executing the function the resampler still leaves residuals in memory even if deleted

Expected behavior

memory usage should stay constant without having to run GC

Actual behavior

memory usage increases if GC is not run manually even if resampler object is deleted

Investigation

I ran the reproduction code to test three different cases:

Baseline: image

delete=True, collect=False image

collect=True image

Reproduction

import psutil
import gc
import av
import matplotlib.pyplot as plt

audio_path = "/mnt/e/Projects/whisper-diarization/096.mp3"
process = psutil.Process()


def minimal_example(audio_path, delete=False):
    resampler = av.audio.resampler.AudioResampler(
        format="s16",
        layout="mono",
        rate=16000,
    )

    with av.open(audio_path, mode="r", metadata_errors="ignore") as container:
        frames = container.decode(audio=0)
        for frame in frames:
            frame = resampler.resample(frame)

    if delete:
        resampler = None
        del resampler


def monitor_memory(audio, n=20, collect=False, delete=False):
    gc.collect()
    init_memory_usage = process.memory_info().rss
    memory_usage = []
    for _ in range(n):
        minimal_example(audio, delete=delete)
        if collect:
            gc.collect()
        memory_usage.append(
            (process.memory_info().rss - init_memory_usage) / 1000000
        )  # Store memory usage in MB

    print("")
    gc.collect()
    # Plotting the memory usage
    plt.plot(memory_usage)
    plt.title("Memory Usage Over Time")
    plt.xlabel("Iteration")
    plt.ylabel("Memory Usage (MB)")
    plt.show()

Versions

  • OS: WSL Ubuntu 22.04
  • PyAV runtime:
PyAV v12.1.0
library configuration: --disable-static --enable-shared --libdir=/tmp/vendor/lib --prefix=/tmp/vendor --disable-alsa --disable-doc --disable-libtheora --disable-mediafoundation --disable-videotoolbox --enable-fontconfig --enable-gmp --enable-gnutls --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libspeex --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libxcb --enable-libxml2 --enable-lzma --enable-zlib --enable-version3 --enable-libx264 --disable-libopenh264 --enable-libx265 --enable-libxvid --enable-gpl
library license: GPL version 3 or later
libavcodec     60. 31.102
libavdevice    60.  3.100
libavfilter     9. 12.100
libavformat    60. 16.100
libavutil      58. 29.100
libswresample   4. 12.100
libswscale      7.  5.100

Research

I have done the following:

Additional context

https://github.com/SYSTRAN/faster-whisper/issues/390 https://github.com/SYSTRAN/faster-whisper/pull/856/

MahmoudAshraf97 avatar Jun 19 '24 12:06 MahmoudAshraf97

This is because av.audio.resampler.AudioResampler uses av.filter.graph.Graph, and obviously graphs require a circular reference which would create an object that is not deletable by traversing acyclic reference graphs. Because of it, while the AudioResampler gets deallocated when it gets out of scope, Graph does not.

It is not a bug, because it does not affect the program's correctness, and the author has relied on cpython's implementation of the cyclic garbage collector to clean up the resources. It would be a performance enhancement if the reference loop is eliminated.

moonsikpark avatar Jun 24 '24 02:06 moonsikpark

@MahmoudAshraf97, please test again using https://github.com/PyAV-Org/PyAV/pull/1439/commits/851ff21b4dd607468bc544c6222980de17fdee01. It will not solve the issue completely, but it could reduce some of the memory footprint.

moonsikpark avatar Jun 24 '24 16:06 moonsikpark

This is the result: image

I guess the problem still exists, and since repeating the same experiment many times doesn't guarantee exact reproduction, I cant verify whether this is partially solved or not

MahmoudAshraf97 avatar Jun 25 '24 15:06 MahmoudAshraf97

Using large enough iteration will help to make the result somewhat deterministic.

Snipaste_2024-06-26_01-14-44 In my machine (arm64 M2 Pro Darwin 14.4.1), with delete=False, collect=False, n=200;

Without applying the patch (left) I can see the gc periodically cleans the circular referenced objects and there seems to be some objects that cannot be recovered by gc.

After applying the patch (right) the memory graph is steadily increasing and no signs of visible gc activity. The graph could indicate a real leak though I haven't investigated it further. Overall the memory footprint seems to be significantly smaller.

@MahmoudAshraf97, can you try my patch again with delete=False, collect=False, n=200 and compare it with v12.1.0 or main?

moonsikpark avatar Jun 25 '24 16:06 moonsikpark

@moonsikpark The Graph object probably has a memory leak, and it certainly has it's own circular references.

WyattBlue avatar Jun 25 '24 21:06 WyattBlue

I'm going to close this even if the issue is only partly resolved because I want future contributors to only focus on Graph and not worry about this fluff.

WyattBlue avatar Jul 25 '24 07:07 WyattBlue