`Resampler` object still occupies memory after deletion
Overview
I have a class that accepts audio inputs and resamples them as needed, the resampler parameters are different for each call so I can't create the resampler as a class member, anyways, after executing the function the resampler still leaves residuals in memory even if deleted
Expected behavior
memory usage should stay constant without having to run GC
Actual behavior
memory usage increases if GC is not run manually even if resampler object is deleted
Investigation
I ran the reproduction code to test three different cases:
Baseline:
delete=True, collect=False
collect=True
Reproduction
import psutil
import gc
import av
import matplotlib.pyplot as plt
audio_path = "/mnt/e/Projects/whisper-diarization/096.mp3"
process = psutil.Process()
def minimal_example(audio_path, delete=False):
resampler = av.audio.resampler.AudioResampler(
format="s16",
layout="mono",
rate=16000,
)
with av.open(audio_path, mode="r", metadata_errors="ignore") as container:
frames = container.decode(audio=0)
for frame in frames:
frame = resampler.resample(frame)
if delete:
resampler = None
del resampler
def monitor_memory(audio, n=20, collect=False, delete=False):
gc.collect()
init_memory_usage = process.memory_info().rss
memory_usage = []
for _ in range(n):
minimal_example(audio, delete=delete)
if collect:
gc.collect()
memory_usage.append(
(process.memory_info().rss - init_memory_usage) / 1000000
) # Store memory usage in MB
print("")
gc.collect()
# Plotting the memory usage
plt.plot(memory_usage)
plt.title("Memory Usage Over Time")
plt.xlabel("Iteration")
plt.ylabel("Memory Usage (MB)")
plt.show()
Versions
- OS: WSL Ubuntu 22.04
- PyAV runtime:
PyAV v12.1.0
library configuration: --disable-static --enable-shared --libdir=/tmp/vendor/lib --prefix=/tmp/vendor --disable-alsa --disable-doc --disable-libtheora --disable-mediafoundation --disable-videotoolbox --enable-fontconfig --enable-gmp --enable-gnutls --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libspeex --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libxcb --enable-libxml2 --enable-lzma --enable-zlib --enable-version3 --enable-libx264 --disable-libopenh264 --enable-libx265 --enable-libxvid --enable-gpl
library license: GPL version 3 or later
libavcodec 60. 31.102
libavdevice 60. 3.100
libavfilter 9. 12.100
libavformat 60. 16.100
libavutil 58. 29.100
libswresample 4. 12.100
libswscale 7. 5.100
Research
I have done the following:
- [x] Checked the PyAV documentation
- [x] Searched on Google
- [x] Searched on Stack Overflow
- [x] Looked through old GitHub issues
Additional context
https://github.com/SYSTRAN/faster-whisper/issues/390 https://github.com/SYSTRAN/faster-whisper/pull/856/
This is because av.audio.resampler.AudioResampler uses av.filter.graph.Graph, and obviously graphs require a circular reference which would create an object that is not deletable by traversing acyclic reference graphs. Because of it, while the AudioResampler gets deallocated when it gets out of scope, Graph does not.
It is not a bug, because it does not affect the program's correctness, and the author has relied on cpython's implementation of the cyclic garbage collector to clean up the resources. It would be a performance enhancement if the reference loop is eliminated.
@MahmoudAshraf97, please test again using https://github.com/PyAV-Org/PyAV/pull/1439/commits/851ff21b4dd607468bc544c6222980de17fdee01. It will not solve the issue completely, but it could reduce some of the memory footprint.
This is the result:
I guess the problem still exists, and since repeating the same experiment many times doesn't guarantee exact reproduction, I cant verify whether this is partially solved or not
Using large enough iteration will help to make the result somewhat deterministic.
In my machine (arm64 M2 Pro Darwin 14.4.1), with
delete=False, collect=False, n=200;
Without applying the patch (left) I can see the gc periodically cleans the circular referenced objects and there seems to be some objects that cannot be recovered by gc.
After applying the patch (right) the memory graph is steadily increasing and no signs of visible gc activity. The graph could indicate a real leak though I haven't investigated it further. Overall the memory footprint seems to be significantly smaller.
@MahmoudAshraf97, can you try my patch again with delete=False, collect=False, n=200 and compare it with v12.1.0 or main?
@moonsikpark The Graph object probably has a memory leak, and it certainly has it's own circular references.
I'm going to close this even if the issue is only partly resolved because I want future contributors to only focus on Graph and not worry about this fluff.