When I used Lora to train Wan2.1-1.3b-T2V, the result was snowflakes
When I used Lora to train Wan2.1-1.3b-T2V, the result was snowflakes.
I tried different hyperparameter settings, including recommended settings, and the generated video results were all snowflake shaped.
I used 20 samples for training.
Learning_rate, rank, and alpha were all attempted differently, but the results were very poor.
RTX3090 is used for graphics cards.
If possible, could you provide the dataset corresponding to the training results in the example? This way, I can investigate whether there is a problem with the dataset settings or a code level issue
@njzxj Hi! We recommend using our product version. https://www.modelscope.cn/aigc/modelTraining
@njzxj It is free.
Hi @njzxj , have you fixed this problem? I also encountered this problem when finetuning wna2.1-t2v-1.3b on my own dataset. When I set lora_alpha to 1, the results are not bad, but not good enough. When I use lora-alpha=64, the snowflake appears.
Hi @njzxj , have you fixed this problem? I also encountered this problem when finetuning wna2.1-t2v-1.3b on my own dataset. When I set lora_alpha to 1, the results are not bad, but not good enough. When I use lora-alpha=64, the snowflake appears.
I finally confirmed that there was a problem in the creation of my dataset. If it's an alpha setting, my experience is to set rank equal to alpha. At the same time, you may need to ensure the reasonableness of the prompt words in the training dataset.
Hi @njzxj , thanks for your prompt reply. By the way, does "reasonableness of the prompt words" means that the prompt should be aligned with the video. Could you please share what problem in creation of your dataset leads to the snowflakes? Thanks a lot
Hi @njzxj , thanks for your prompt reply. By the way, does "reasonableness of the prompt words" means that the prompt should be aligned with the video. Could you please share what problem in creation of your dataset leads to the snowflakes? Thanks a lot
My dataset issue isn’t really instructive. I’m working on an inpainting task, and after binarizing the mask I accidentally set the values to [0, 0.00001]. The prompt problem is one I first saw when training a character LoRA. When I use complex prompts the results are poor, but with a simple prompt like “someone doing something” the quality is much better.