NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Titanet-L Augmentation

Open CreativeSelf0 opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe.

I want to train TitaNet with augmentation. However, I do not know how can I prepare the rir_noise_manifest.json, file for training Titanet-L with rir noise augmentation.

Describe the solution you'd like

Provide a code snippet on how to add augmentation for TitaNet-L training.

CreativeSelf0 avatar May 07 '24 13:05 CreativeSelf0

I added the following to augmentor, and used the following snippet from online augmentation tutorial

rir_data_path = f'{data_dir}/dataset'
!python {NEMO_ROOT}/scripts/dataset_processing/get_openslr_rir_data.py --data_root {rir_data_path}
rir_manifest_path = os.path.join(rir_data_path, 'processed', 'rir.json')
!head -n 3 {rir_manifest_path}

Then to use the augmentation I applied the following

audio_augmentations = dict(
    speed = dict(
        sr=16000,
        prob=0.3,
        resample_type='kaiser_fast',
        min_speed_rate=0.95,
        max_speed_rate=1.05,
    ),
    noise = dict(
        manifest_path=rir_manifest_path,
        prob=0.5,
        min_snr_db=0,
        max_snr_db=15,
    ),
)
finetune_config.model.train_ds.augmentor = audio_augmentations

Am I correct and thanks @okuchaiev

CreativeSelf0 avatar May 07 '24 14:05 CreativeSelf0

Yes, code looks fine to me. But for impulse you should use impulse pertubation not noise pertubation. Sample can be found here: https://github.com/NVIDIA/NeMo/blob/6442bb67275759f5ece6bd5e366966216e050cfe/examples/speaker_tasks/recognition/conf/titanet-small.yaml#L14

nithinraok avatar May 08 '24 17:05 nithinraok

@nithinraok that's what I thought, However in Titanet-Large they use noise instead of impulse, and it says we are using impulse perturbation. So, does that mean in their training they made an error using RIR corpora for noise instead of pulse perturbation. https://github.com/NVIDIA/NeMo/blob/6442bb67275759f5ece6bd5e366966216e050cfe/examples/speaker_tasks/recognition/conf/titanet-large.yaml#L14-L26

The paper statement:

image

(just realized you are the first author x.x) Thank you @nithinraok

CreativeSelf0 avatar May 08 '24 18:05 CreativeSelf0

I don;t remember details exactly but as far I remember RIR corpora also has noise samples as well along with impulse responses, and I have not added impulse section to this config file but was added to titanet-small config.

nithinraok avatar May 08 '24 18:05 nithinraok

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jun 08 '24 01:06 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Jun 15 '24 01:06 github-actions[bot]