Leo Huang comments

Results 11 comments of


                                            Leo Huang

Diarization

@ggerganov please help, I did extactly same thing as what @Dmitriuso did, yt-dlp -xv --audio-format wav -o skillsfuture.wav https://www.youtube.com/watch?v=girQacfWjMw&list=PLH2CR4s1lqyjFm4vQPKT0-hE8sh2T10I1 ffmpeg -i skillsfuture.wav -acodec pcm_s16le -ar 16000 sf.wav ./main -m ../whisper-models/ggml-base.en.bin...

Diarization

Is that possible, we integrate ECAPA-TDNN model from [SpeechBrain](https://github.com/speechbrain/speechbrain) into this project, like what following project have done? https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization Tested with this video, https://www.youtube.com/watch?v=girQacfWjMw&list=PLH2CR4s1lqyjFm4vQPKT0-hE8sh2T10I1 works pretty well. But it is...

Crash on iPhone when Using CoreML

Thanks @bjnortier for quick reply. I previously used the code from commit: 09e90680072d8ecdf02eaf21c393218385d2c616 It works perfectly on same iPhone device. Does this means there is much more memory usage since...

Crash on iPhone when Using CoreML

"When you load a CoreML model it is optimised on the device" - is model optimized saved to local storage, or it is in memory? If answer is the latter...

How can I continue to download from the disconnection point?

> Hello, the download failed due to the disconnection of the network connection in the process of downloading audio data. How can I continue to download from the disconnection point?...

Knowledge distillation support for Nemo ASR models

I'm also interested in this topic. any update?

Inconsistent result when use different embeddin config

- ... it might change in the future Does this mean pyannote/embedding will be optimized? or there will be better model than speechbrain/spkrec-ecapa-voxceleb? - ... you need to optimize this...

Duplicate words generated

I just tried this commit: https://github.com/ggerganov/whisper.cpp/commit/f19e23fbd108ec3ac458c7a19b31c930719e7a94 which was mentioned in this link, https://github.com/ggerganov/whisper.cpp/issues/612 I got same result: [00:00:44.000 --> 00:00:50.000] This is not certain because Kepler 442B's atmosphere and surface...

Duplicate words generated

Just tested with [openai whisper](https://github.com/openai/whisper). It does not have such issue. $> whisper --model base wrongResultWithWhisper.wav

Duplicate words generated

@ggerganov Are those setting correct? ``` params.n_max_text_ctx = 64; params.temperature_inc = 0.1f; params.beam_search.beam_size = 5; params.entropy_thold = 2.8f; ``` params is whisper_full_params. Others settings are as following ``` params.print_realtime =...