Paul Wilson comments

Results 7 comments of


                                            Paul Wilson

Allow token processing "middleware"

@thisandagain Is this still issue still open? @nsantini Did you still want to create a PR with this feature? Let me know if if it would be helpful pitching in.

Add Twitter validation dataset

I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis It also may be interesting examining how effective this works against longer texts, one example is the [Cornell Movie Review...

Duplicate words generated

This appears to be related to closed issues [#471](https://github.com/ggerganov/whisper.cpp/issues/471)[ #477](https://github.com/ggerganov/whisper.cpp/issues/477)[ #508](https://github.com/ggerganov/whisper.cpp/issues/508)[ #612](https://github.com/ggerganov/whisper.cpp/issues/612)[ #719](https://github.com/ggerganov/whisper.cpp/issues/719)[ #731](https://github.com/ggerganov/whisper.cpp/issues/731) and an attempted [fix](https://github.com/ggerganov/whisper.cpp/commit/f19e23fbd108ec3ac458c7a19b31c930719e7a94) released in v1.3.0 Here are excerpts of the duplication seen built after...

Duplicate words generated

In response to https://github.com/ggerganov/whisper.cpp/issues/508#issuecomment-1435907929 I experimented with raising the entropy threshold (2.8 and 3.5) and it does avoid specific duplication but does not solve all cases and I'm not sure...

Duplicate words generated

In reference to the audio file used to highlight the issue in https://github.com/ggerganov/whisper.cpp/issues/896#issuecomment-1562283987 @jordibruin I see this audio file performs reasonably well in MacWhisper. Did you face this issue and...

Duplicate words generated

@ggerganov Appreciate the detailed response as those settings did resolve the issue.

Duplicate words generated

@leohuang2013 Do you have an audio file you can share and steps to reproduce?