espnet icon indicating copy to clipboard operation
espnet copied to clipboard

Which configuration file and recipe to be used to train CTC/Attention architecture with MFCC features?

Open mukherjeesougata-eros opened this issue 2 years ago • 3 comments

Describe your question

I want to train a CTC/Attention based acoustic model using MFCC features for ASR task. So, for that which config file and recipe should be used?

mukherjeesougata-eros avatar Oct 29 '23 10:10 mukherjeesougata-eros

@sw005320 Can you kindly please answer this

mukherjeesougata-eros avatar Oct 29 '23 15:10 mukherjeesougata-eros

A gentle reminder for the same.

mukherjeesougata-eros avatar Oct 30 '23 08:10 mukherjeesougata-eros

The bash code only supports fbank and raw inputs. Given the case, you can implement a frontend for raw inputs that support mfcc using torchaudio (https://pytorch.org/audio/main/generated/torchaudio.transforms.MFCC.html), and add the option to ASR task: https://github.com/espnet/espnet/blob/b1335e7b1206363a170fe61e0735faf9727b392e/espnet2/tasks/asr.py#L90

Fhrozen avatar Oct 30 '23 09:10 Fhrozen