BancoLin
BancoLin
``` const char * text = whisper_full_get_token_text(ctx, i, j); ... printf("%s%s%s%s", speaker.c_str(), k_colors[col].c_str(), text, "\033[0m"); ``` The issue stems from the possibility that the token `text` may not adhere to...
in most cases the model works well on Chinese because both English and Chinese share many phonetic properties.
Use ffmpeg to downsample audio files.
with default setting you need a GPU with at least 24 GB RAM
> RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. my solution is downgrade PyTorch to 1.10...
report my results: - use ffmpeg to downsample the source data from 48khz to 16khz. - use pretrained model in the best_ckpt/ folder - use CPU to do evaluation In...
@AotYan please see my comment in #246
> [@BancoLin](https://github.com/BancoLin) thank you for your advice, I have another question, in my train script the `sparsify_start` ,`sparsify_stop` , `sparsify_interval`, and `sparsify_exponent` remain unchanged, while the epochs are `200` and...
@battlefor The rnnoise training script already covers 'little model' training steps (model sparsification), but your training data must large enough to trigger it. If I recall correctly, the little model...
> > the little model training steps start at iteration 2500 and stop at iteration 8000 > > the `iteration`, do you mean in one epoch or the whole training...