VibeVoice icon indicating copy to clipboard operation
VibeVoice copied to clipboard

Timestamps in output

Open RockStone opened this issue 5 months ago • 4 comments

On the project page, the demos show timestamps at the sentence level. What is the API configuration needed to include these timestamps in the output?

RockStone avatar Sep 03 '25 19:09 RockStone

Currently, no. The timestamps shown on the project page are not generated directly by the API. Instead, they are derived from the audio through a two-step process:

  1. Run ASR on the generated audio to obtain transcriptions with timestamps.
  2. Align the transcriptions with the ground-truth text using a dynamic programming algorithm to assign timestamps at the sentence level. This is a great suggestion, and we’ll discuss it further as part of our future plans.

YaoyaoChang avatar Sep 08 '25 01:09 YaoyaoChang

@YaoyaoChang Thanks for the clarification. May I ask what ASR model did you use? Any preference?

RockStone avatar Sep 12 '25 01:09 RockStone

Whisper

YaoyaoChang avatar Sep 12 '25 02:09 YaoyaoChang