BF: AudioStim.close to not breed ffmpeg processes by moviepy
As initially enjoyed by @caravanuden while observing "Too many open files" crash message while iterating over the list of audio files with a few hundreds of them in a simple code like:
from __future__ import print_function
import sys
import os
import time
from glob import glob
from pliers.stimuli import VideoStim, AudioStim
from pliers.extractors import MelspectrogramExtractor
spect = MelspectrogramExtractor()
PID = os.getpid()
print("PID: %d" % PID)
for i, f in enumerate(sys.argv[1:]): # [:1]*1000):
print("%4d %30s \t#file: %d" % (i, f, len(glob('/proc/%d/fd/*' % PID))), end='\r')
sys.stdout.flush()
audio_stim = AudioStim(filename=f, sampling_rate=44100)
# get spectrogram
spectral_stim = spect.transform(audio_stim)
audio_stim.close()
We had to add the .close() at the end to prevent breeding of ffmpeg zombies. I am not quite certain why gc didn't pick those up -- something might be holding on to those instances, thus not calling any of the __del__s. (in general in python3 it is not reliable to rely on those -- attributes might disappear before actual content of them thus not being able to close them at all). Overall I was not sure about the life cycle of any object there so this is a minimalistic workaround which worked for us
Thanks! One concern though is that without a matching open() method, this seems incomplete. If you close() a clip this way, without destroying the whole stim, then a user can potentially try to pass it through a Transformer again later, and it will break unexpectedly. Could you add an open() method that just wraps _load_clip (maybe with a check to see if it's already loaded, and then the initializer can point to open)? Thanks!
Oh, and we should probably also do the same thing for VideoStim objects. Happy to add it myself if you don't have time.
I'll handle it! I'll simply add open() and close() methods for VideoStims and AudioStims.
Coverage decreased (-12.9%) to 79.657% when pulling 0ba932d37d5af2cb5eb2ae99b774eda95b2a6e76 on yarikoptic:bf-ffmpeg-audiofiles into 8dae7ecdd94c7af41d2b3a60e141ff3a5a7cc07f on tyarkoni:master.
sorry, just got to the normal keyboard again. Sure, you are welcome to pick up and fix it up the best way you see it fit! Few comments though:
-
_load_clipis invoked by the constructor, so thus avoiding open (unless provided outside instance, and in that case probably shouldn't be closed inside) - you could benefit from exposing Stim's as a context manager, defining something like
def __enter__(self):
if not self.clip:
self.open()
return self
def __exit__(self, exc_type, exc_value, traceback):
self.close()
within the base Stim class (didn't look in detail if all of them have .clip and .open etc) so ppl could use constructs such as
with AudioStim('filename') as stim:
stim.dousefulstuff()
and do not need to mess around with open/close dance
- looked "deeper". So -- the whole exhaustion is from the use of
@_memoizewith caches every transform result by default, thus slowly growing that cache of TransformResults which carry.stimso original moviepy instance is held along with its zombified by then ffmpeg etc. Thus another "more proper" way to avoid it for @caravanuden is
from pliers import config
config.set_option('cache_transformers', False)
but I wonder why caching is enabled (not disabled) by default. Also it might be worth make it so it would cache up to a specified number of transformers, with the oldest used last time kicked out first -- would then scale better for larger use-cases i guess
Thanks for the comments, this is helpful!
I agree that implementing __enter__ and __exit__ makes sense.
Re: caching, the reason it's on by default is that many of the Extractor classes cost money to use, so it made sense to me to memoize the results by default--otherwise we're just costing our users (and ourselves) money unnecessarily. The better approach would be to set the defaults on a class-specific basis, or at least, to have a separate caching settings for paid classes. If nothing else, we should probably add a class-level flag indicating whether a service is free or not.
On the topic of caching, we should probably move to disk-based caching--or at least, make that an option. I initially started out using joblib, but there were some issues I couldn't easily resolve that prompted the move to a simpler solution (my vague recollection is it had to do with pickling).
Any update on this, @yarikoptic @caravanuden? If you don't have time to get to it, that's fine--@qmac or I can probably get to it soon. Just let us know, as it does seem like a fairly high priority to fix. Thanks!
This looks pretty stale to me. Okay to close?
It was stale but there were no indication here that it was fixed one way or another (it might have been, I do not follow the development)
don't ever change, @yarikoptic
"NEVER!" said Yarik who closed a few dozen stale bugs in DataLad in the past week ;) (in my defense -- to the best of my knowledge they all were either addressed, superseded, or no longer pertinent ;))
hey, in my defense there's a difference between closing a PR, and closing the issue ;)
Hey - initial description was so rich that I totally missed that it was a PR! Should we convert it to an issue? ;-) or may be there is no issue? Anyways, I guess I was just missing you guys add jumped onto an opportunity for a chat ;-) feel free to re close
Codecov Report
Base: 75.64% // Head: 75.59% // Decreases project coverage by -0.05% :warning:
Coverage data is based on head (
711c99b) compared to base (6e5e23b). Patch coverage: 25.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## master #252 +/- ##
==========================================
- Coverage 75.64% 75.59% -0.06%
==========================================
Files 65 65
Lines 3876 3880 +4
==========================================
+ Hits 2932 2933 +1
- Misses 944 947 +3
| Impacted Files | Coverage Δ | |
|---|---|---|
| pliers/stimuli/audio.py | 90.69% <25.00%> (-6.74%) |
:arrow_down: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.