Paul Wilson
Paul Wilson
@thisandagain Is this still issue still open? @nsantini Did you still want to create a PR with this feature? Let me know if if it would be helpful pitching in.
I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis It also may be interesting examining how effective this works against longer texts, one example is the [Cornell Movie Review...
This appears to be related to closed issues [#471](https://github.com/ggerganov/whisper.cpp/issues/471)[ #477](https://github.com/ggerganov/whisper.cpp/issues/477)[ #508](https://github.com/ggerganov/whisper.cpp/issues/508)[ #612](https://github.com/ggerganov/whisper.cpp/issues/612)[ #719](https://github.com/ggerganov/whisper.cpp/issues/719)[ #731](https://github.com/ggerganov/whisper.cpp/issues/731) and an attempted [fix](https://github.com/ggerganov/whisper.cpp/commit/f19e23fbd108ec3ac458c7a19b31c930719e7a94) released in v1.3.0 Here are excerpts of the duplication seen built after...
In response to https://github.com/ggerganov/whisper.cpp/issues/508#issuecomment-1435907929 I experimented with raising the entropy threshold (2.8 and 3.5) and it does avoid specific duplication but does not solve all cases and I'm not sure...
In reference to the audio file used to highlight the issue in https://github.com/ggerganov/whisper.cpp/issues/896#issuecomment-1562283987 @jordibruin I see this audio file performs reasonably well in MacWhisper. Did you face this issue and...
@ggerganov Appreciate the detailed response as those settings did resolve the issue.
@leohuang2013 Do you have an audio file you can share and steps to reproduce?