ChatTTS Add fine-tuning scripts

Add fine-tuning scripts. The commands are provided at the top of each file.

There are a few items to note:

I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving utils directory or put scripts into examples folder).
The current fine-tuning scripts are not with very good performance. We need to test different hyper-parameters (lr, etc.) and provide benchmark results.
For the used dataset from Xz乔希, I'm wondering if we should put it in another repo https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py

cc @fumiama

Aug 11 '24 07:08 ain-soph

https://github.com/ain-soph/ChatTTS/blob/3c1c75d2994bac54ea78cecd1b046d51a0f575b0/ChatTTS/utils/finetune/train.py#L334-L336 The current format is so bad and difficult to read. Would we consider extend max-line-length or move to autopep8?

Aug 11 '24 08:08 ain-soph

The current format is so bad and difficult to read.

The formatting program named black that we use is just a program, so it has many problems. But in many cases it works and tidy the code so we chose to use it.

You can change the format by yourself by breaking those dots into many lines to avoid the max-line rule.

Aug 11 '24 10:08 fumiama

As for autopep8, you can run it locally and see the effect. If it passed in autopep8, maybe it will pass in black.

Aug 11 '24 10:08 fumiama

I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving utils directory or put scripts into examples folder).

The codes in utils should be put separately in ChatTTS folder.

For the used dataset from Xz乔希, I'm wondering if we should put it in another repo

Sure. The main repo should not contain any data, including the dummy_data.

When you finish, let me know and I will check your code by detail. If you still have any questions, feel free to ask.

Aug 11 '24 11:08 fumiama

I will merge the other PR first. After merging, you can update this branch.

Aug 11 '24 11:08 fumiama

The codes in utils should be put separately in ChatTTS folder.

I put them under ChatTTS.utils.finetune now.

And I removed the dummy data. You may also want to review the Xz dataset codes. I have the google drive link in it and I don't know if I shall put it there. https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py

Aug 11 '24 15:08 ain-soph

这里想请教下，如果想针对新的音色进行模型精调，是只训练spk_emb矩阵嘛？还是需要同时训练spk_emb，gpt相关模块呀？

Aug 26 '24 03:08 gafield-liu

这里想请教下，如果想针对新的音色进行模型精调，是只训练spk_emb矩阵嘛？还是需要同时训练spk_emb，gpt相关模块呀？

我尝试针对新的音色，固定or训练spk_emb，固定or训练gpt.gpt模块，固定or训练decoder模块，loss使用的就是mel频谱的mse loss和语音logits的交叉熵，但始终不能得到一个很稳定（音色相似or稳定）的模型表现。

想请问可以指导一下吗～ @fumiama @ain-soph

Aug 28 '24 06:08 gafield-liu

@gafield-liu 训练效果确实不太行，可能得调一调训练参数。我现在的只是随便写的

Aug 31 '24 19:08 ain-soph

@gafield-liu 训练效果确实不太行，可能得调一调训练参数。我现在的只是随便写的

这里应该缺少了语音embedding的提取模块，随机初始化的话音色精调出来效果不行～

Sep 04 '24 03:09 gafield-liu

Hi @ain-soph, and @fumiama

Thank you so much for your hard work and the fine-tuning. I found this project just a day ago, and I’m happy to say I was able to fine-tune without any errors using VDAE and GPTSpeakers

I just tried the new update Merge branch '2noise'. today to Fine-tuning DVAE worked fine, but I got an error when trying to fine-tune GPT. Here’s the error message i get

ChatTTS\utils\finetune\model.py", line 204, in get_hidden_states_and_labels inputs_embeds = chat.gpt.forward(input_ids=input_ids, text_mask=text_mask) TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

I really appreciate all your work and would be grateful for any help with this error.

Thanks again for your time!

Oct 06 '24 15:10 lpscr

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days.
It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

Oct 06 '24 22:10 ain-soph

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days. It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

Appreciate. I will do it at your next push that you fix the test.

Oct 09 '24 12:10 fumiama

@fumiama The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 . While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

Oct 10 '24 21:10 ain-soph

The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 . While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

Well, if there's nothing MUST require python>=3.12, the compatibility should be kept the same as former version.

Oct 11 '24 07:10 fumiama

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

Nov 05 '24 03:11 ain-soph

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

Maybe you should use List[int] to avoid this problem because this is a compatibility issue that can be solved as long as you import List but not use list. Also, there're many devices that stick at old version of python/pytorch for some reasons and we should not drop a version of support except there's a significant point that make us have to.

Nov 05 '24 05:11 fumiama

Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.

Nov 08 '24 04:11 ain-soph

Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.

Thanks for your understanding. Maybe you can split this PR into some independent parts and open a few PRs as long as those parts complete in order to avoid the sync-upstream work due to long time modification.

Nov 09 '24 05:11 fumiama