transformers Add training compatibility for Musicgen-like models

This PR aims to add training compatibility for Musicgen and Musicgen Melody.

The main difference with classic cross-entropy is that there a num_codebooks labels to predict per timestamp instead of a single token per timestamp. This materializes in the loss which consists in the mean of cross-entropy per codebook.

A few additional insights:

The models don't have an EOS token id, so the models generate for max_length.
The model actually predict codebooks in a delayed pattern. - The first codebook channel is predicted without delay, but the further you go, the more delay there is (2nd codebook -> delayed by 1, 3rd codebook -> delayed by 2, etc.)
Training scripts will be shared as well

cc @sanchit-gandhi and @amyeroberts

Mar 22 '24 08:03 ylacombe

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Mar 22 '24 09:03 HuggingFaceDocBuilderDev

Hi! Is it possible to finetune the musicgen model currently? If yes then is there something I should keep in mind. Would be really helpful if you could share your opinions. Thanks!

Apr 10 '24 17:04 arjunsinghrathore

Wonderful work! I'm currently attempting to fine-tune the Musicgen model using these codes, but I haven't succeeded yet. Is the model ready for fine-tuning, and are there specific aspects I should be aware of? Any training tips or guidance you could provide would be greatly appreciated!

Thank you so much!

Apr 12 '24 10:04 LiuZH-19

Hey @arjunsinghrathore and @LiuZH-19, I'll likely release some fine-tuning code next week or the week after! May I ask what type of data do you have, out of curiosity ? Thanks!

Apr 12 '24 14:04 ylacombe

Hey @amyeroberts, gentle ping to ask for a review! Many thanks for your help!

Apr 19 '24 06:04 ylacombe

Many thanks for the review @amyeroberts, I've changed the code according to your comments! The only left to address is the loss computation being breaking changes. let me know what you think of this. Note that I don't believe a lot of users actually used the loss computation as it was.

Apr 20 '24 18:04 ylacombe