atila

Results 23 issues of atila

Thanks for releasing the code! I have been reviewing how the Gumbel-Softmax[1] trick was used and both the paper and the code suggest that the "relevance scores are interpreted as...

It would be great if `brew install whisperkit` just works and the WhisperKit CLI target on macOS could become an out-of-the-box real-time transcription utility.

enhancement
triaged

It would be great if the first text token's logprob can be used* to discard a transcription draft as failed and start over. Start over could mean either falling back...

good first issue
feature

It would be great if certain patterns in the newly added word timestamps (#38 ) can be leveraged to reduce the incidence rate of hallucinations. This change will require comprehensive...

enhancement
good first issue

Implement tests to transcribe long audio files (at least several minutes worth) and measure the memory and latency over time. This is to guard against memory leaks or slowdowns potentially...

enhancement
help wanted

The goal is to leverage the high-quality word-level timestamps added in #38 as anchors to reliably seek the audio buffer forward at a higher frequency compared to current behavior: -...

enhancement
triaged

After specifying a minimum OS version of macOS13 and iOS16, there is still a large matrix of possible model-device configurations for deployment: Devices have varying capabilities across: - **Available RAM:**...

- [Eager Streaming Mode](https://x.com/argmaxinc/status/1774809790595932658) relies on confirming the currently predicted text tokens with at least 1 redundant historical prediction. - Whisper is susceptible to outputting tokens that trivially differ (e.g....

We should update the WhisperAX UI element that reads `Language` to `Source Language` so it is clear that Whisper doesn't support any target language other than English at the moment.

enhancement
triaged

Moving away from hard-coded model names for device support mapping, we should write the supported device list in config.json during model generation in whisperkittools and read that in WhisperKit to...