Add fine-tuning scripts
Add fine-tuning scripts. The commands are provided at the top of each file.
There are a few items to note:
- I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving
utilsdirectory or put scripts intoexamplesfolder). - The current fine-tuning scripts are not with very good performance. We need to test different hyper-parameters (lr, etc.) and provide benchmark results.
- For the used dataset from Xz乔希, I'm wondering if we should put it in another repo https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py
cc @fumiama
https://github.com/ain-soph/ChatTTS/blob/3c1c75d2994bac54ea78cecd1b046d51a0f575b0/ChatTTS/utils/finetune/train.py#L334-L336
The current format is so bad and difficult to read. Would we consider extend max-line-length or move to autopep8?
The current format is so bad and difficult to read.
The formatting program named black that we use is just a program, so it has many problems. But in many cases it works and tidy the code so we chose to use it.
You can change the format by yourself by breaking those dots into many lines to avoid the max-line rule.
As for autopep8, you can run it locally and see the effect. If it passed in autopep8, maybe it will pass in black.
I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving
utilsdirectory or put scripts intoexamplesfolder).
The codes in utils should be put separately in ChatTTS folder.
For the used dataset from Xz乔希, I'm wondering if we should put it in another repo
Sure. The main repo should not contain any data, including the dummy_data.
When you finish, let me know and I will check your code by detail. If you still have any questions, feel free to ask.
I will merge the other PR first. After merging, you can update this branch.
The codes in utils should be put separately in
ChatTTSfolder.
I put them under ChatTTS.utils.finetune now.
And I removed the dummy data. You may also want to review the Xz dataset codes. I have the google drive link in it and I don't know if I shall put it there. https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py
这里想请教下,如果想针对新的音色进行模型精调,是只训练spk_emb矩阵嘛?还是需要同时训练spk_emb,gpt相关模块呀?
这里想请教下,如果想针对新的音色进行模型精调,是只训练spk_emb矩阵嘛?还是需要同时训练spk_emb,gpt相关模块呀?
我尝试针对新的音色,固定or训练spk_emb,固定or训练gpt.gpt模块,固定or训练decoder模块,loss使用的就是mel频谱的mse loss和语音logits的交叉熵,但始终不能得到一个很稳定(音色相似or稳定)的模型表现。
想请问可以指导一下吗~ @fumiama @ain-soph
@gafield-liu 训练效果确实不太行,可能得调一调训练参数。我现在的只是随便写的
@gafield-liu 训练效果确实不太行,可能得调一调训练参数。我现在的只是随便写的
这里应该缺少了语音embedding的提取模块,随机初始化的话音色精调出来效果不行~
Hi @ain-soph, and @fumiama
Thank you so much for your hard work and the fine-tuning. I found this project just a day ago, and I’m happy to say I was able to fine-tune without any errors using VDAE and GPTSpeakers
I just tried the new update Merge branch '2noise'. today to Fine-tuning DVAE worked fine, but I got an error when trying to fine-tune GPT. Here’s the error message i get
ChatTTS\utils\finetune\model.py", line 204, in get_hidden_states_and_labels inputs_embeds = chat.gpt.forward(input_ids=input_ids, text_mask=text_mask) TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'
I really appreciate all your work and would be grateful for any help with this error.
Thanks again for your time!
@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days.
It would be nice if you can do a full code review.
I'll continue working on improving the training performance.
@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days. It would be nice if you can do a full code review.
I'll continue working on improving the training performance.
Appreciate. I will do it at your next push that you fix the test.
@fumiama The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 .
While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.
What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.
Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.
The reason of failure is the test file import
Loggerfrom https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 . While my logger classSmoothedValuein the same file usestyping.Self, which is supported after python 3.12.What's your suggestion about the compatibility? Shall we still support python<3.12 and uses
-> "SmoothedValue"instead of-> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.Overall, my codes requires
python>=3.12, while existing test file requires support forpython<3.12.
Well, if there's nothing MUST require python>=3.12, the compatibility should be kept the same as former version.
@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].
As a reference, pytorch requires python>=3.9 since 2.5
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.
@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing
list[int].As a reference, pytorch requires python>=3.9 since 2.5
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue def update_list(self, value_list: list[float]) -> 'SmoothedValue': TypeError: 'type' object is not subscriptable Error: tests/#655.py exited with a non-zero status. Test tests/#655.py success Error: Process completed with exit code 1.
Maybe you should use List[int] to avoid this problem because this is a compatibility issue that can be solved as long as you import List but not use list. Also, there're many devices that stick at old version of python/pytorch for some reasons and we should not drop a version of support except there's a significant point that make us have to.
Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.
Will revert to python 3.8 style later. My current codes are heavily relying on match,
|operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.
Thanks for your understanding. Maybe you can split this PR into some independent parts and open a few PRs as long as those parts complete in order to avoid the sync-upstream work due to long time modification.