Which configuration file and recipe to be used to train CTC/Attention architecture with MFCC features?
Describe your question
I want to train a CTC/Attention based acoustic model using MFCC features for ASR task. So, for that which config file and recipe should be used?
@sw005320 Can you kindly please answer this
A gentle reminder for the same.
The bash code only supports fbank and raw inputs. Given the case, you can implement a frontend for raw inputs that support mfcc using torchaudio (https://pytorch.org/audio/main/generated/torchaudio.transforms.MFCC.html), and add the option to ASR task: https://github.com/espnet/espnet/blob/b1335e7b1206363a170fe61e0735faf9727b392e/espnet2/tasks/asr.py#L90