Question about rhythm transfer

Open mvoodarla opened this issue 2 years ago • 0 comments

I have the following use case: I would like train urhythmic to a target speakers voice and do any-to-one voice conversion that ensures the target timbre, but I would like to variably change the rhythm, pitch, speed, intonation, etc between predictions based on the intonation of a separate source audio. Is this possible?

I have successfully trained the vocoder which sounds really good, and have actually tried inferencing with varying rhythm-fine models but can't really hear it affecting things as much as the rhythm of the source clip. In your sample, the outputs feel like they have a rhythm more true to the target voices (though it's unclear whether you trained on just that sample somehow, or the speaker in general).

Any tips or insights would be appreciated :)

Currently one approach I'm taking is to retrain the rhythm model at inference time for both the source and the target speaker and have it overfit to that single sample.

I also want to commend how clean this codebase is. This is how all OSS ML repos should be.

Aug 04 '23 00:08 mvoodarla