hls.js How can I show word by word captions/subtitles?

What do you want to do with Hls.js?

I need to show word by word real time captions as they are spoken.

I may add 0 or N words at a time depending on how many words are spoken in a 1 second interval

eg.

I
I need
I need to
I need to show
... etc

I want to words to stay on screen in place as the words build up on screen before they are replaced by new words which get added one by one.

As the sentence is being built up I can improve the transcription and update the displayed words with better results (this is because showing the words as they are spoken is less accurate than when we have more audio to take into account)

eg.

"I need to show sword"

might become this after collecting more of the sentence

"I need to show word by word"

What would be the best way to do this?

Would I create WebVTT files that overlap or is there a better way to show words on screen and update them in realtime?

eg.

WebVTT with one word
WebVTT with two words and the prior word repeated
etc

What have you tried so far?

No response

Oct 16 '23 15:10 DamienDeepgram

Hi @DamienDeepgram It's probably not currently supported, but you described the timestamp tag. https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#timestamp_tag

00:16.500 --> 00:18.500
When the moon <00:17.500>hits your eye

00:00:18.500 --> 00:00:20.500
Like a <00:19.000>big-a <00:19.500>pizza <00:20.000>pie

00:00:20.500 --> 00:00:21.500
That's <00:00:21.000>amore

Oct 16 '23 16:10 mtoczko

To achieve this today you need to create multiple cues. Ex:

00:16.500 --> 00:18.500
When the moon

00:17.500 --> 00:18.500
When the moon hits

00:17.750 --> 00:18.500
When the moon hits your

00:18.000 --> 00:18.500
When the moon hits your eye

It's probably not currently supported, but you described the timestamp tag. https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#timestamp_tag

I don't know what support looks like for the timestamp tag in WebVTT on devices and browser or in VTT.js. If someone was interested enough in adding support to HLS.js we would entertain taking a contribution. It would need to include sample assets to show the desire extends to authoring content with these tags.

That being said, the suggested use-case involves being able to style painted on / future text (karaoke) which would more than likely require support in browser and how they render cues. https://www.w3.org/wiki/VTT_Concepts#Timestamp_Tags_.28Karaoke_Style_and_Paint_On_Caption_Text.29

Oct 25 '23 13:10 robwalch