pliers icon indicating copy to clipboard operation
pliers copied to clipboard

BF: AudioStim.close to not breed ffmpeg processes by moviepy

Open yarikoptic opened this issue 8 years ago • 14 comments

As initially enjoyed by @caravanuden while observing "Too many open files" crash message while iterating over the list of audio files with a few hundreds of them in a simple code like:

from __future__ import print_function
import sys
import os
import time
from glob import glob
from pliers.stimuli import VideoStim, AudioStim
from pliers.extractors import MelspectrogramExtractor


spect = MelspectrogramExtractor()
PID = os.getpid()
print("PID: %d" % PID)
for i, f in enumerate(sys.argv[1:]): # [:1]*1000):
    print("%4d %30s \t#file: %d" % (i, f, len(glob('/proc/%d/fd/*' % PID))), end='\r')
    sys.stdout.flush()
    audio_stim = AudioStim(filename=f, sampling_rate=44100)

    # get spectrogram
    spectral_stim = spect.transform(audio_stim)
    audio_stim.close()

We had to add the .close() at the end to prevent breeding of ffmpeg zombies. I am not quite certain why gc didn't pick those up -- something might be holding on to those instances, thus not calling any of the __del__s. (in general in python3 it is not reliable to rely on those -- attributes might disappear before actual content of them thus not being able to close them at all). Overall I was not sure about the life cycle of any object there so this is a minimalistic workaround which worked for us

yarikoptic avatar Feb 05 '18 21:02 yarikoptic

Thanks! One concern though is that without a matching open() method, this seems incomplete. If you close() a clip this way, without destroying the whole stim, then a user can potentially try to pass it through a Transformer again later, and it will break unexpectedly. Could you add an open() method that just wraps _load_clip (maybe with a check to see if it's already loaded, and then the initializer can point to open)? Thanks!

tyarkoni avatar Feb 05 '18 21:02 tyarkoni

Oh, and we should probably also do the same thing for VideoStim objects. Happy to add it myself if you don't have time.

tyarkoni avatar Feb 05 '18 21:02 tyarkoni

I'll handle it! I'll simply add open() and close() methods for VideoStims and AudioStims.

caravanuden avatar Feb 05 '18 22:02 caravanuden

Coverage Status

Coverage decreased (-12.9%) to 79.657% when pulling 0ba932d37d5af2cb5eb2ae99b774eda95b2a6e76 on yarikoptic:bf-ffmpeg-audiofiles into 8dae7ecdd94c7af41d2b3a60e141ff3a5a7cc07f on tyarkoni:master.

coveralls avatar Feb 06 '18 01:02 coveralls

sorry, just got to the normal keyboard again. Sure, you are welcome to pick up and fix it up the best way you see it fit! Few comments though:

  • _load_clip is invoked by the constructor, so thus avoiding open (unless provided outside instance, and in that case probably shouldn't be closed inside)
  • you could benefit from exposing Stim's as a context manager, defining something like
    def __enter__(self):
        if not self.clip:
            self.open()
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        self.close()

within the base Stim class (didn't look in detail if all of them have .clip and .open etc) so ppl could use constructs such as

with AudioStim('filename') as stim:
   stim.dousefulstuff()

and do not need to mess around with open/close dance

  • looked "deeper". So -- the whole exhaustion is from the use of @_memoize with caches every transform result by default, thus slowly growing that cache of TransformResults which carry .stim so original moviepy instance is held along with its zombified by then ffmpeg etc. Thus another "more proper" way to avoid it for @caravanuden is
from pliers import config
config.set_option('cache_transformers', False)

but I wonder why caching is enabled (not disabled) by default. Also it might be worth make it so it would cache up to a specified number of transformers, with the oldest used last time kicked out first -- would then scale better for larger use-cases i guess

yarikoptic avatar Feb 06 '18 02:02 yarikoptic

Thanks for the comments, this is helpful!

I agree that implementing __enter__ and __exit__ makes sense.

Re: caching, the reason it's on by default is that many of the Extractor classes cost money to use, so it made sense to me to memoize the results by default--otherwise we're just costing our users (and ourselves) money unnecessarily. The better approach would be to set the defaults on a class-specific basis, or at least, to have a separate caching settings for paid classes. If nothing else, we should probably add a class-level flag indicating whether a service is free or not.

On the topic of caching, we should probably move to disk-based caching--or at least, make that an option. I initially started out using joblib, but there were some issues I couldn't easily resolve that prompted the move to a simpler solution (my vague recollection is it had to do with pickling).

tyarkoni avatar Feb 06 '18 03:02 tyarkoni

Any update on this, @yarikoptic @caravanuden? If you don't have time to get to it, that's fine--@qmac or I can probably get to it soon. Just let us know, as it does seem like a fairly high priority to fix. Thanks!

tyarkoni avatar Apr 02 '18 16:04 tyarkoni

This looks pretty stale to me. Okay to close?

adelavega avatar Nov 12 '20 00:11 adelavega

It was stale but there were no indication here that it was fixed one way or another (it might have been, I do not follow the development)

yarikoptic avatar Nov 12 '20 14:11 yarikoptic

don't ever change, @yarikoptic

tyarkoni avatar Nov 12 '20 20:11 tyarkoni

"NEVER!" said Yarik who closed a few dozen stale bugs in DataLad in the past week ;) (in my defense -- to the best of my knowledge they all were either addressed, superseded, or no longer pertinent ;))

yarikoptic avatar Nov 13 '20 01:11 yarikoptic

hey, in my defense there's a difference between closing a PR, and closing the issue ;)

adelavega avatar Nov 13 '20 03:11 adelavega

Hey - initial description was so rich that I totally missed that it was a PR! Should we convert it to an issue? ;-) or may be there is no issue? Anyways, I guess I was just missing you guys add jumped onto an opportunity for a chat ;-) feel free to re close

yarikoptic avatar Nov 13 '20 05:11 yarikoptic

Codecov Report

Base: 75.64% // Head: 75.59% // Decreases project coverage by -0.05% :warning:

Coverage data is based on head (711c99b) compared to base (6e5e23b). Patch coverage: 25.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #252      +/-   ##
==========================================
- Coverage   75.64%   75.59%   -0.06%     
==========================================
  Files          65       65              
  Lines        3876     3880       +4     
==========================================
+ Hits         2932     2933       +1     
- Misses        944      947       +3     
Impacted Files Coverage Δ
pliers/stimuli/audio.py 90.69% <25.00%> (-6.74%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Sep 21 '22 17:09 codecov-commenter