audio-webui icon indicating copy to clipboard operation
audio-webui copied to clipboard

[FEATURE REQUEST] separate bark setting to potentially improve long form generation

Open yesbroc opened this issue 2 years ago • 7 comments

Basically, because quality degrades the longer the input is, making continuations from worsening quality outputs will severely impact the overall output. I've found this to be true while using Strict long and short.

I believe stitching together multiple, independent outputs would keep the output quality consistent rather than continuing from degrading outputs.

yesbroc avatar Oct 30 '23 13:10 yesbroc

Yeah, i was actually kind of planning to add something for that, a setting that lets you choose between using the same history over and over, and looping the history back.

Since putting the same clip over and over causes inconsistencies with emotions etc. While looping it back has issues with it gaining noise, but having a more realistic, but also slowly degrading output.

I'm wondering if there's a way I could get the best of both worlds, like some denoiser to improve the audio quality throughout loopbacks, which would allow it to be consistent, but not lose the quality.

gitmylo avatar Oct 30 '23 15:10 gitmylo

what does 'history' mean in this context?

yesbroc avatar Oct 30 '23 16:10 yesbroc

Bark uses a system called "history prompts" which are basically context for the language model, It allows it to retain the same voice. These history prompts are stored in .npz files, containing 3 .npy files. The coarse, fine and semantic prompts.

gitmylo avatar Oct 30 '23 18:10 gitmylo

ahh ok, what if for the whole prompt, they use the first sentence as history, then the rest of the paragraph uses that first sentence's history.

or the webui can detect whether a "[]" pops up and treats that as it's own history, which it uses for that one sentence.

yesbroc avatar Oct 30 '23 23:10 yesbroc

Custom formatting for controlling the prompts could be useful. I'll think about it.

gitmylo avatar Nov 01 '23 13:11 gitmylo

is there also a way to use multiple voices in one prompt?

yesbroc avatar Nov 02 '23 16:11 yesbroc

Not currently but custom formatting could make it possible

gitmylo avatar Nov 02 '23 20:11 gitmylo