whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

No gaps between subs?

Open gab-luz opened this issue 3 years ago • 5 comments

I used whisper.cpp to process a whole tv series. It's detecting about 99% of the words said but subtitles are not getting any gap/interval between them and I don't want to have the task to fix those gaps. Is there any way to fix that? I'm afraid I can't upload the subs here but there are no gaps.

gab-luz avatar Nov 27 '22 03:11 gab-luz

maybe try to copy a small portion of the subtitles here to see in detail what is the exact problem.

Topping1 avatar Nov 27 '22 03:11 Topping1

OK. I've found a Green Hornet episode and I wanted to transcribe the subs so I've managed to do that. But since this is copyrighted material. Can I post a small portion of the subs just to show the problem?

gab-luz avatar Nov 30 '22 22:11 gab-luz

You can just demonstrate what you mean by "gaps" - you can use some random text - it does not have to make sense. It is not clear what you mean by "gaps between the subtitles".

ggerganov avatar Dec 01 '22 17:12 ggerganov

I have this issue too. I believe OP is noticing subtitles showing up before they are spoken so no gaps even when there are pauses in the audio.

1 00:00:00,000 --> 00:00:10,480 This is sentence one.

2 00:00:10,480 --> 00:00:31,000 This is sentence two.

3 00:00:31,000 --> 00:00:41,120 This is sentence three.

4 00:00:41,120 --> 00:00:44,120 This is sentence four.

In segment 2 there is no speaking till about 29 seconds in. Similar in segment 3. No speaking till 38 seconds in When looking through the .srt there are very few segments that don't start at the previous segments end time.

troy236 avatar Dec 01 '22 19:12 troy236

As you can see, no gaps most of the time:

1 00:00:00,000 --> 00:00:05,000 Behold, I show you a mystery.

2 00:00:05,000 --> 00:00:09,000 We shall not all sleep, but we shall all be changed.

3 00:00:09,000 --> 00:00:14,000 In a moment, in the twinkling of an eye, at the last trumpet.

4 00:00:14,000 --> 00:00:20,000 And now, let us bow our heads in a moment of silent prayer.

5 00:00:20,000 --> 00:00:26,000 Bannister's the one with the mourning band on his arm.

6 00:00:26,000 --> 00:00:30,270 I wonder what he meant about seeing him right after the funeral or else it might be

7 00:00:30,270 --> 00:00:31,000 too late.

8 00:00:31,000 --> 00:00:33,000 We'll soon know.

9 00:00:55,000 --> 00:00:57,000 Let me through here.

10 00:00:57,000 --> 00:00:59,000 What is it, Sergeant?

11 00:00:59,000 --> 00:01:00,000 He's been shot!

12 00:01:00,000 --> 00:01:02,000 Shot?

13 00:01:02,000 --> 00:01:08,000 Another challenge for the Green Hornet.

14 00:01:08,000 --> 00:01:12,000 His aide, Kato, and their rolling arsenal, the Black Beauty.

15 00:01:12,000 --> 00:01:17,000 On police records, a wanted criminal, the Green Hornet is really Britt Reid,

16 00:01:17,000 --> 00:01:20,000 owner-publisher of the Daily Sentinel.

17 00:01:20,000 --> 00:01:25,000 His dual identity known only to his secretary and to the district attorney.

18 00:01:25,000 --> 00:01:31,560 And now, to protect the rights and lives of decent citizens, rides the Green Hornet

19 00:01:31,560 --> 00:01:32,000 .

20 00:01:32,000 --> 00:02:01,000 [Music]

21 00:02:01,000 --> 00:02:30,000 [Music]

22 00:02:30,000 --> 00:02:33,000 Thanks, Miss Case.

gab-luz avatar Dec 02 '22 07:12 gab-luz

This is a limitation of the model - see for example: https://github.com/openai/whisper/discussions/375 You can try to use the -ml 1 option and see if it can potentially help, but it is an experimental approach so it might not work reliably.

ggerganov avatar Dec 02 '22 18:12 ggerganov