PyHa icon indicating copy to clipboard operation
PyHa copied to clipboard

[♻️] Add tqdm for Progress Bars

Open npham-dev opened this issue 2 years ago • 5 comments

Addresses #168

I'm not entirely sure what loops should have been modified, so

  1. I introduced the tqdm dependency
  2. Replaced custom timing logic with tqdm (in places I could find it)

Notebook runs fine but unit tests seemed to have failed on the configuration step.

Example of progress bar in the notebook: Example of progress bar in the notebook

npham-dev avatar Jan 09 '24 15:01 npham-dev

Nice! I think the most important place for these to go would be on the os.listdir(path) loops inside of the generate_automated_labels_tweetynet, generate_automated_labels_microfaune, generate_automated_labels_birdnet, etc... since those are fairly large bottlenecks.

JacobGlennAyers avatar Jan 09 '24 17:01 JacobGlennAyers

Here is an example from some unpushed changes I was playing around with on the template-matching branch - Note that the tqdm is on the loop that iterates through the audio files in a directory def generate_automated_labels_template_matching( audio_dir, isolation_parameters, manual_id="template", normalized_sample_rate=44100): """

audio_dir (string)
        - Path to directory with audio files.

    isolation_parameters (dict)
        - Python Dictionary that controls the various label creation
          techniques. 

    manual_id (string)
        - controls the name of the class written to the pandas dataframe.
        - default: "template"

    normalized_sample_rate (int)
        - Sampling rate that the audio files should all be normalized to.

Returns:
    Dataframe of automated labels for the audio clips in audio_dir.
"""

logger = logging.getLogger("Template Matching Autogenerated Labels")
assert isinstance(audio_dir, str)
assert isinstance(isolation_parameters, dict)
assert isinstance(manual_id, str)
assert isinstance(normalized_sample_rate, int)
assert normalized_sample_rate > 0
bandpass = False
b = None
a = None
if "cutoff_freq_low" in isolation_parameters.keys() and "cutoff_freq_high" in isolation_parameters.keys():
    bandpass = True
    assert isinstance(isolation_parameters["cutoff_freq_low"], int)
    assert isinstance(isolation_parameters["cutoff_freq_high"], int)
    assert isolation_parameters["cutoff_freq_low"] > 0 and isolation_parameters["cutoff_freq_high"] > 0
    assert isolation_parameters["cutoff_freq_high"] > isolation_parameters["cutoff_freq_low"]
    assert isolation_parameters["cutoff_freq_high"] <= int(0.5*normalized_sample_rate)
    
# initialize annotations dataframe
annotations = pd.DataFrame()

# processing the template clip
try:
    # loading the template signal
    TEMPLATE, SAMPLE_RATE = librosa.load(isolation_parameters["template_path"], sr=normalized_sample_rate, mono=True)
    if bandpass:
        b, a = butter_bandpass(isolation_parameters["cutoff_freq_low"], isolation_parameters["cutoff_freq_high"], SAMPLE_RATE)
        TEMPLATE = filter(TEMPLATE, b, a)
    
    TEMPLATE_spec = generate_specgram(TEMPLATE, SAMPLE_RATE)
    TEMPLATE_mean = np.mean(TEMPLATE_spec)
    TEMPLATE_spec -= TEMPLATE_mean
    TEMPLATE_std_dev = np.std(TEMPLATE_spec)
    n = TEMPLATE_spec.shape[0] * TEMPLATE_spec.shape[1]


except KeyboardInterrupt:
    exit("Keyboard Interrupt")
except BaseException:
    checkVerbose("Failed to load and process template " + isolation_parameters["template_path"], isolation_parameters)
    exit("Can't do template matching without a template")

# looping through the clips to process
for audio_file in tqdm(os.listdir(audio_dir)):
    # skip directories
    if os.path.isdir(audio_dir + audio_file):
        continue
    # loading in the audio clip
    try:
        SIGNAL, SAMPLE_RATE = librosa.load(os.path.join(audio_dir, audio_file), sr=normalized_sample_rate, mono=True)
        if bandpass:
            SIGNAL = filter(SIGNAL, b, a)
    except KeyboardInterrupt:
        exit("Keyboard Interrupt")
    except BaseException:
        checkVerbose("Failed to load " + audio_file, isolation_parameters)
        continue
    
    # generating local score array from clip
    try: 
        local_score_arr = template_matching_local_score_arr(SIGNAL, SAMPLE_RATE, TEMPLATE_spec, n, TEMPLATE_std_dev)
    except KeyboardInterrupt:
        exit("Keyboard Interrupt")
    except BaseException:
        checkVerbose("Failed to collect local score array of " + audio_file, isolation_parameters)
        continue

    # passing through isolation technique
    try:
        new_entry = isolate(
            local_score_arr,
            SIGNAL,
            SAMPLE_RATE,
            audio_dir,
            audio_file,
            isolation_parameters,
            manual_id=manual_id,
        )
        if annotations.empty:
            annotations = new_entry
        else:
            annotations = pd.concat([annotations, new_entry])
    except KeyboardInterrupt:
        exit("Keyboard Interrupt")
    except BaseException as e:
        checkVerbose(e, isolation_parameters)
        checkVerbose("Error in isolating bird calls from " + audio_file, isolation_parameters)
        continue

annotations.reset_index(inplace=True, drop=True)
return annotations

JacobGlennAyers avatar Jan 10 '24 02:01 JacobGlennAyers

I imagine a message could look something like: "Performing Template Matching on " + dir_name

JacobGlennAyers avatar Jan 10 '24 02:01 JacobGlennAyers

I'll also add that in the near future we will be adding in two new local score array generation techniques; template-matching and a foreground-background segmentation technique. Once those are up, it will be easier to add tqdm to their respective loops all at once.

JacobGlennAyers avatar Jan 10 '24 06:01 JacobGlennAyers

@JacobGlennAyers Whats the status of this branch? Seems like you wanted to add something before more development was done for it

Sean1572 avatar May 15 '24 19:05 Sean1572