Add training compatibility for Musicgen-like models
This PR aims to add training compatibility for Musicgen and Musicgen Melody.
The main difference with classic cross-entropy is that there a num_codebooks labels to predict per timestamp instead of a single token per timestamp. This materializes in the loss which consists in the mean of cross-entropy per codebook.
A few additional insights:
- The models don't have an EOS token id, so the models generate for
max_length. - The model actually predict codebooks in a delayed pattern. - The first codebook channel is predicted without delay, but the further you go, the more delay there is (2nd codebook -> delayed by 1, 3rd codebook -> delayed by 2, etc.)
- Training scripts will be shared as well
cc @sanchit-gandhi and @amyeroberts
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Hi! Is it possible to finetune the musicgen model currently? If yes then is there something I should keep in mind. Would be really helpful if you could share your opinions. Thanks!
Wonderful work! I'm currently attempting to fine-tune the Musicgen model using these codes, but I haven't succeeded yet. Is the model ready for fine-tuning, and are there specific aspects I should be aware of? Any training tips or guidance you could provide would be greatly appreciated!
Thank you so much!
Hey @arjunsinghrathore and @LiuZH-19, I'll likely release some fine-tuning code next week or the week after! May I ask what type of data do you have, out of curiosity ? Thanks!
Hey @amyeroberts, gentle ping to ask for a review! Many thanks for your help!
Many thanks for the review @amyeroberts, I've changed the code according to your comments! The only left to address is the loss computation being breaking changes. let me know what you think of this. Note that I don't believe a lot of users actually used the loss computation as it was.