Patrick Li
Patrick Li
@dit-j We are having the same issue as you during packaging. Our suspicion is that the problem lies with the build path to our index.html: dev ? 'http://localhost:8080?start' : `file://${__dirname}/../../dist/index.html?start'`...
Results from eval on long audio contexts: Note* ASR measured by WER, and translation measured by BLEU. combine-n means we're concatenating n samples into one. https://wandb.ai/fixie/ultravox/runs/0ws1m9us/overview english to chinese: eval/covost2_long_audio-asr-combine-5-en_zh-CN.2k-asr:0.23348530515869995...
> Interesting. Do you have a sample output dataset I could take a look at? yeah, take a look here for the original dataset: https://huggingface.co/datasets/fixie-ai/covost2_long_audio here for the output results:...
Yeah I ran some evals for the duplicate, and what ended up happening was it would only transcribe the first utterance. So the WER was really high (around 0.82).
TODO: verify num_epochs # of steps matches what we expect.
> TODO: verify num_epochs # of steps matches what we expect. set to 1 epoch with 3000 total samples. batch size of 24, with 8 gpus. We expect 16 steps...
With this change, we can no longer use non generic datasets (with or without epoch), because the multiplier requires a length associated with the dataset, which only generic datasets allow...
Fix: https://github.com/fixie-ai/ultravox/pull/90 You need to have an estimate of the dataset size ahead of time for this to work.
Adding slovak now :)
https://huggingface.co/datasets/fixie-ai/common_voice_17_0/tree/main slovak (sk) has been added