VibeVoice
VibeVoice copied to clipboard
Timestamps in output
On the project page, the demos show timestamps at the sentence level. What is the API configuration needed to include these timestamps in the output?
Currently, no. The timestamps shown on the project page are not generated directly by the API. Instead, they are derived from the audio through a two-step process:
- Run ASR on the generated audio to obtain transcriptions with timestamps.
- Align the transcriptions with the ground-truth text using a dynamic programming algorithm to assign timestamps at the sentence level. This is a great suggestion, and we’ll discuss it further as part of our future plans.
@YaoyaoChang Thanks for the clarification. May I ask what ASR model did you use? Any preference?
Whisper