Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
How hard would it be to use human flagged audio files to train the D and G models?
I want to include this in my process, give the "good" files as the standard "in dir" but also have a "bad dir"
these are TTS generations humans have listened to and flagged as approved or rejected. It seems like this should boost both models.
any pointers are appreciated