Desh Raj
Desh Raj
In its current form, DOVER-Lap does not elegantly handle the case when the input hypotheses are mixed-type. By "mixed type", we mean that some of them have overlapping segments and...
I'm running into this weird issue, and I'm not sure why adding manifests should create a lazy iterator? ```python In [5]: m = prepare_ali_meeting('/export/c01/corpora6/AliMeeting') Preparing Train: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 209/209 [02:56 0...
Currently, when creating a multi-channel`Recording` manually by specifying sources as different single-channel files (as is done in the AMI setup, for example), we do not strictly enforce the condition that...
The following works well: ``` lhotse split -s 16 data/librispeech/raw_cuts_dev-clean.jsonl.gz data/librispeech/tmp ``` But the following does not: ``` gunzip -c data/librispeech/raw_cuts_dev-clean.jsonl.gz | lhotse split -s 16 - data/librispeech/tmp ``` Here...
I trained a Conformer CTC model with Icefall and was trying to use it for decoding, but I am getting the following error: **[F] /export/c07/draj/mini_scale_2022/k2/k2/csrc/ragged.cu:116:k2::Array1& k2::RaggedShape::RowIds(int32_t) Check failed: axis <...
Is there a recommended recipe I can refer to for fine-tuning one of the pretrained models (say, the Gigaspeech model) on my own data?
In terms of which VAD to apply, you can use e.g. SileroVAD: https://github.com/snakers4/silero-vad/wiki/Examples-and-Dependencies#examples Actually a workflow/integration into Lhotse would be nice if somebody is willing to contribute that. _Originally posted...
New torchaudio with ffmpeg backend can read SPH files. It doesn't seem to be any faster than sph2pipe, but at least it does not require installing new tools.
This workflow shows how we can use SpeechBrain x-vectors + sklearn agglomerative clustering to perform a crude speaker diarization. This can be used on top of the whisper workflow to...
It seems that currently, we never enforce `hop` to be positive. I am not sure what the intended result would be for zero or negative hop values.