xuankai@cmu
xuankai@cmu
Hi @YoshikiMas, thanks for noticing the problem. Yeah, I agree with the suggested change in stage 5. For the `lm_train_text`, we can simply use `data/${train_set}/text`, the same as what is...
Hi @zqwang7 , in (1), it is not supposed to pass normalize flag, because it is the collect stats. I assume the error (2) is because no stats were generated...
@zqwang7 [line 66](https://github.com/espnet/espnet/blob/2aa734d65013a0b33a6f8cb59b22159a87360eb8/egs2/chime4/asr1/conf/tuning/train_asr_transformer_wavlm_lr1e-3_specaug_accum1_preenc128_warmup20k.yaml#L66) uses `extract_feats_in_collect_stats`.
Hi @minamo817 , in the training stage, the number of outputs was not checked. So you can successfully complete training (stage 6). But in stage 7 during inference, it is...
> We could do all of this within a single module, but I figured breaking it up would be easier to read/navigate. Do you have any recommendations? I got it....
@BriansIDP You need to follow stage 14 & 16 in `asr.sh`.
@BriansIDP Another suggestion: it may be better to update the huggingface repo name. If you notice other pretrained checkpoints, you can at least find the name follows the pattern: contributer_dataset_expname.
Hi @cyaaronk , thanks for the updates. I'm thinking that if it is only for asr models, maybe you can put it in `egs2/TEMPLATE/asr1/pyscripts/utils`. Then you may modify the `asr.sh`...
@Yuanyuan-888 The quick fix is to try an earlier version of whisper, `20230308`. Whisper has changed their tokenizer API.