Add text generation stream status to shared module, use for better TTS with auto-play
Hey oobabooga, thanks for this webui! I added a simple way for extensions to know when text generation is finished, so that I could auto-play the TTS audio. I also made some other quality-of-life changes to the TTS extension, see the commit description for the details.
Feel free to do whatever you like with this pull request.
You beat me to it :) Though your implementation has a lot more functionality. Seeing as this should be able to run as a server I don't think simpleaudio is the best solution as it would only produce the sounds on the host computer. May I suggest you or oobabooga have a peek at my PR. I used the native autoplay functionality of the audio block.
Perhaps a merge of our solutions would be best?
Using the audio block autoplay is definitely the better method, thanks for pointing it out! And I might as well disable audio generation during the stream like you too. I'll test out adding some of your changes and report back.
Indeed, simpleaudio seems a bit troublesome. Trying to install it with pip on Linux caused
c_src/simpleaudio_alsa.c:8:10: fatal error: alsa/asoundlib.h: No such file or directory
8 | #include <alsa/asoundlib.h>
All right, things should be ready. I've removed simpleaudio and am using the html tags like Christoph. To handle the old messages, i remove the autoplay tag from the previous message in input_modifier by accessing the internal and visible shared.history. I also finished the pitch+speed control and cleaned up the settings for the extension.
I did a bunch of testing (on windows), and found two minor bugs affecting the audio history which I've changed to handle switching between characters, but it looks like the source of the bugs are outside of the extension. I'm pretty sure these bugs are also present in the current version of silero_tts on the main branch.
Anyway, let me know if there are any issues on linux.
Two questions:
- About the bug that you mentioned while switching questions, I could not understand/reproduce it. Can you explain in more details?
- When I click on Regenerate, it seems like the audio is not updated, at least if I had already played part of the previous audio.
This is looking very good now!
- The bug had to do with using a
message_id=len(shared.history['visible']), where the value of message_id was not updating ifshared.historywas changed from choosing another character or clearing history. - The other bug where regenerated messages were using old audio was a problem with the browser caching the old audio. Both of those problems are now fixed by using timestamps instead of
message_id.
I also added an option to remove all the audio blocks from chat history (the errors in the terminal from deleted audio files were annoying, but I couldn't find a way to mute them. This fixes that haha)
Finally I used some similar code so that changing the "show message text" would also affect the chat history.
But it looks like the new streaming method broke the audio generation, so there is still a bit of work to do.
Everything seems to be working on my end now. If how I'm changing shared.history doesn't feel right, I'm happy to scrap those parts.
Looks good to me too. Thanks again for submitting this PR, this is a massive improvement to the silero extension and I really liked it.
When combined with the whisper extension, it should allow for a very immersive chat experience.
I will merge now. Credits have been added to https://github.com/oobabooga/text-generation-webui/wiki/Extensions.
I am attempting to port this function to elevenlabs, but I have come against a similar issue as described above. When there are cached messages the bot will not generate new audio and only plays old messages. If I delete the cache, then the bot only generates audio sporadically. I am not a dev, I'm just a guy trying to get this to work, but maybe you could take a look at my code and recommend a fix?
Were you trying to port the stream status or the autoplay?
There were some changes to the sileroTTS extension after this pull that removed the "stream status" variable, since it wasn't using the output_modifier function as intended. Instead it now sets shared.args.no_stream = True in the input_modifier function temporarily to work when streaming is enabled.
If you're talking about the autoplay feature and how it sometimes plays old messages after regeneration, the fix was to make every audio file's name unique using a timestamp. That way the ui is forced to not use the cache. To do this we import time at the start and set the audio file name using a string like output_file = Path(f'extensions/silero_tts/outputs/{shared.character}_{int(time.time())}.wav'). In your file it looks like you'd need to change the file path names on lines 123 and 126.
Great work with the extension, good luck with the fix!
I have made those fixes and gotten everything sorted to the best of my abilities, but now upon generation of text, it fails to generate audio and terminal shows the following error. I know this isn't your problem and I'm sorry if I'm bothering you, but I feel like I'm one step away from getting this to work.
`Output generated in 5.74 seconds (1.92 tokens/s, 11 tokens, context 30, seed 1352445741) ERROR: Exception in ASGI application Traceback (most recent call last): File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\responses.py", line 335, in call stat_result = await anyio.to_thread.run_sync(os.stat, self.path) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\oobabooga_windows\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) FileNotFoundError: [WinError 2] The system cannot find the file specified: 'D:\oobabooga_windows\text-generation-webui\extensions\elevenlabs_tts\outputs\None_1682279953.wav'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:\oobabooga_windows\installer_files\env\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 429, in run_asgi result = await app( # type: ignore[func-returns-value] File "D:\oobabooga_windows\installer_files\env\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\applications.py", line 276, in call await super().call(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\errors.py", line 184, in call raise exc File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\errors.py", line 162, in call await self.app(scope, receive, _send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\cors.py", line 84, in call await self.app(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\exceptions.py", line 79, in call raise exc File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\middleware\exceptions.py", line 68, in call await self.app(scope, receive, sender) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in call raise e File "D:\oobabooga_windows\installer_files\env\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in call await self.app(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 718, in call await route.handle(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 276, in handle await self.app(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\routing.py", line 69, in app await response(scope, receive, send) File "D:\oobabooga_windows\installer_files\env\lib\site-packages\starlette\responses.py", line 338, in call raise RuntimeError(f"File at path {self.path} does not exist.") RuntimeError: File at path D:\oobabooga_windows\text-generation-webui\extensions\elevenlabs_tts\outputs\None_1682279953.wav does not exist.`
No worries, the code is easy enough to read and it really does look like one step away haha. I haven't done any testing myself, but it looks like you just need to replace line 128 from save_bytes_to_path(Path((f'extensions/elevenlabs_tts/{shared.character}_{int(time.time())}.wav')), audio_data) to save_bytes_to_path(output_file, audio_data) so that the save function actually uses the output_file variable that the string variable on line 131 is looking for.
If that doesn't fix things then the issue might have to do with how long elevenlabs takes to generate the audio, which I would have no idea on how to handle.
THANK YOU SO MUCH! This works perfectly! All I need to do now is clone Scarlett Johansson's voice and I've got a full "Her" situation going.