OpenVoice icon indicating copy to clipboard operation
OpenVoice copied to clipboard

控制生成音频的静音长度

Open a122760 opened this issue 1 year ago • 3 comments

如何控制生成音频的静音长度?有些音频的静音特别长,甚至占比50%,如何控制生成音频的静音长度呢?尤其是开头和结尾。

a122760 avatar Jan 03 '25 03:01 a122760

Hey @a122760, I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together!

在 OpenVoice 代码库中,控制生成音频静音长度的参数可以在 split_audio_vad 函数中找到。具体来说,min_silence_duration 参数设置为 1 秒:

segments = get_vad_segments(
    audio_vad,
    output_sample=True,
    min_speech_duration=0.1,
    min_silence_duration=1,
    method="silero",
)

这个参数决定了将一个片段视为单独音频片段所需的最小静音时长。通过调整 min_silence_duration 的值,你可以控制生成音频中的静音长度,尤其是在开头和结尾部分[1]

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

dosubot[bot] avatar Jan 03 '25 03:01 dosubot[bot]

Hey @a122760, I'm here to help you with any bugs, questions, or contributions. Let's tackle this issue together!

在 OpenVoice 代码库中,控制生成音频静音长度的参数可以在 split_audio_vad 函数中找到。具体来说,min_silence_duration 参数设置为 1 秒:

segments = get_vad_segments(
    audio_vad,
    output_sample=True,
    min_speech_duration=0.1,
    min_silence_duration=1,
    method="silero",
)

这个参数决定了将一个片段视为单独音频片段所需的最小静音时长。通过调整 min_silence_duration 的值,你可以控制生成音频中的静音长度,尤其是在开头和结尾部分[1]

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

No, this should control the silence duration during VAD segmentation, not the parameter for controlling the silence duration in TTS synthesized audio. Is there a way to control the silence duration in the synthesized audio?

a122760 avatar Jan 03 '25 03:01 a122760

To control the silence duration in TTS synthesized audio in the OpenVoice codebase, you can modify the audio_numpy_concat method in the BaseSpeakerTTS class. The silence duration is controlled by the following line:

audio_segments += [0] * int((sr * 0.05)/speed)

This line adds a silence of 0.05 seconds (or 50 milliseconds) between audio segments. You can adjust the silence duration by changing the value 0.05 to your desired duration in seconds [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

dosubot[bot] avatar Jan 03 '25 03:01 dosubot[bot]