How can I show word by word captions/subtitles?
What do you want to do with Hls.js?
I need to show word by word real time captions as they are spoken.
I may add 0 or N words at a time depending on how many words are spoken in a 1 second interval
eg.
- I
- I need
- I need to
- I need to show
- ... etc
I want to words to stay on screen in place as the words build up on screen before they are replaced by new words which get added one by one.
As the sentence is being built up I can improve the transcription and update the displayed words with better results (this is because showing the words as they are spoken is less accurate than when we have more audio to take into account)
eg.
"I need to show sword"
might become this after collecting more of the sentence
"I need to show word by word"
What would be the best way to do this?
Would I create WebVTT files that overlap or is there a better way to show words on screen and update them in realtime?
eg.
- WebVTT with one word
- WebVTT with two words and the prior word repeated
- etc
What have you tried so far?
No response
Hi @DamienDeepgram It's probably not currently supported, but you described the timestamp tag. https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#timestamp_tag
00:16.500 --> 00:18.500
When the moon <00:17.500>hits your eye
00:00:18.500 --> 00:00:20.500
Like a <00:19.000>big-a <00:19.500>pizza <00:20.000>pie
00:00:20.500 --> 00:00:21.500
That's <00:00:21.000>amore
To achieve this today you need to create multiple cues. Ex:
00:16.500 --> 00:18.500
When the moon
00:17.500 --> 00:18.500
When the moon hits
00:17.750 --> 00:18.500
When the moon hits your
00:18.000 --> 00:18.500
When the moon hits your eye
It's probably not currently supported, but you described the timestamp tag. https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#timestamp_tag
I don't know what support looks like for the timestamp tag in WebVTT on devices and browser or in VTT.js. If someone was interested enough in adding support to HLS.js we would entertain taking a contribution. It would need to include sample assets to show the desire extends to authoring content with these tags.
That being said, the suggested use-case involves being able to style painted on / future text (karaoke) which would more than likely require support in browser and how they render cues. https://www.w3.org/wiki/VTT_Concepts#Timestamp_Tags_.28Karaoke_Style_and_Paint_On_Caption_Text.29