DiffSynth-Studio icon indicating copy to clipboard operation
DiffSynth-Studio copied to clipboard

Does anyone train a working Wan t2v lora from the t2v script of this repo?

Open qiwang1996 opened this issue 11 months ago • 9 comments

i use 10 picture - text pairs and all pictures have the same resolution = [1350, 1080] (h, w) this is my settings:

# latent cache
CUDA_VISIBLE_DEVICES="0" python train_wan_t2v.py \
    --task data_process \
    --dataset_path data/halle \
    --output_path ./output \
    --text_encoder_path "./models/Wan-AI/Wan2.1-T2V-14B/models_t5_umt5-xxl-enc-bf16.pth" \
    --vae_path "./models/Wan-AI/Wan2.1-T2V-14B/Wan2.1_VAE.pth" \
    --tiled \
    --num_frames 1 \
    --height 576 \
    --width 448
# run training
CUDA_VISIBLE_DEVICES="0" python train_wan_t2v.py \
    --task train \
    --train_architecture lora \
    --dataset_path data/halle  \
    --output_path ./output \
    --dit_path "[
            \"./models/Wan-AI/Wan2.1-T2V-14B/diffusion_pytorch_model-00001-of-00006.safetensors\",
            \"./models/Wan-AI/Wan2.1-T2V-14B/diffusion_pytorch_model-00002-of-00006.safetensors\",
            \"./models/Wan-AI/Wan2.1-T2V-14B/diffusion_pytorch_model-00003-of-00006.safetensors\",
            \"./models/Wan-AI/Wan2.1-T2V-14B/diffusion_pytorch_model-00004-of-00006.safetensors\",
            \"./models/Wan-AI/Wan2.1-T2V-14B/diffusion_pytorch_model-00005-of-00006.safetensors\",
            \"./models/Wan-AI/Wan2.1-T2V-14B/diffusion_pytorch_model-00006-of-00006.safetensors\"
        ]"  \
    --steps_per_epoch 500 \
    --max_epochs 10 \
    --learning_rate 5e-6 \
    --lora_rank 16 \
    --lora_alpha 8 \
    --lora_target_modules "q,k,v,o,ffn.0,ffn.2" \
    --accumulate_grad_batches 1 \
    --use_gradient_checkpointing

metadata.csv: file_name,text media_00001.jpg,"TOK has a curly, high ponytail hairstyle, wearing a white crop top and light blue high-waisted pants. She has a silver choker with a pendant and large silver earrings. She stands indoors, near a wooden door and a grey magnetic board with a butterfly magnet. The setting appears to be a modern, possibly home office or studio." media_00002.jpg,"TOK has medium brown skin and shoulder-length, dark, curly dreadlocks. She wears a black t-shirt, a silver choker with a heart pendant, and silver hoop earrings. Her makeup includes winged eyeliner and nude lipstick. The background shows a modern, well-lit room with a glass shelf holding various figurines and collectibles, including a small figurine of a woman with red hair. The room has a cozy, eclectic decor." media_00003.jpg,"TOK has a medium skin tone, wearing a black, halter-neck, floor-length gown with a high slit, standing confidently. She has curly, shoulder-length black hair and is wearing black, strappy high-heeled sandals. The background is a gradient of gray, and she poses with one hand on her hip, exuding elegance and poise." ......

I still got bad results from 1e-4 to 5e-6 while tuning lr_learning_rate and results are just a bunch of snowflakes... Where did i do wrong?

qiwang1996 avatar Feb 27 '25 12:02 qiwang1996

i only revise code to enable load dit_path by json-format and no other code has been changed

qiwang1996 avatar Feb 27 '25 12:02 qiwang1996

https://github.com/user-attachments/assets/9bd8e30b-97e8-44f9-bb6f-da004ba376a9

This is a video generated by our lora. The effect is not stable. We are testing parameters, including lora_rank, learning_rate, ...

We will provide a set of recommended default parameters based on the experimental results.

Artiprocher avatar Feb 28 '25 02:02 Artiprocher

Thanks for your testing and reply. i am now using https://github.com/tdrussell/diffusion-pipe to train lora of your Wan model and it works properly.

qiwang1996 avatar Feb 28 '25 03:02 qiwang1996

There is another problem that your trained lora's parameter names do not fit to comfy native loading and thus it can not be used in comfyui

qiwang1996 avatar Feb 28 '25 03:02 qiwang1996

@qiwang1996 Hi, I want to know how to do that (using diffusion-pipe to train lora of DiffSynth-Studio wan models)? Is there any tutorial?

ZhouQianang avatar Apr 15 '25 09:04 ZhouQianang

i also use comfyui for inference and loras trained by diffusion-pipe can be loaded seamelessly. i think your mentioned problem was caused by incorrect lora training

---Original--- From: @.> Date: Wed, Apr 23, 2025 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [modelscope/DiffSynth-Studio] Does anyone train a working Wan t2vlora from the t2v script of this repo? (Issue #367)

@qiwang1996 Hi, I am also using https://github.com/tdrussell/diffusion-pipe to train the LoRA module for the Wan model. However, it doesn't seem to provide inference code. Some people use ComfyUI to implement inference. So, I try to use this repo to complete inference process. I checked the code for loading the Lora module in this repo and I think it can work even though I used a Lora checkpoint trained with "diffusion-pipe". Unfortunately, the quality of the video I created is very poor, as shown in the picture. Therefore, I would like to know if you have any other inference codes available?

image.png (view on web)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***> wine-plum left a comment (modelscope/DiffSynth-Studio#367)

@qiwang1996 Hi, I am also using https://github.com/tdrussell/diffusion-pipe to train the LoRA module for the Wan model. However, it doesn't seem to provide inference code. Some people use ComfyUI to implement inference. So, I try to use this repo to complete inference process. I checked the code for loading the Lora module in this repo and I think it can work even though I used a Lora checkpoint trained with "diffusion-pipe". Unfortunately, the quality of the video I created is very poor, as shown in the picture. Therefore, I would like to know if you have any other inference codes available?

image.png (view on web)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

qiwang1996 avatar Apr 29 '25 03:04 qiwang1996

i also use comfyui for inference and loras trained by diffusion-pipe can be loaded seamelessly. i think your mentioned problem was caused by incorrect lora training 我也使用 comfyui 进行推理,并且可以无缝加载由 diffusion-pipe 训练的 loras。我认为你提到的问题是由错误的 lora 训练引起的。

Thank you very much for your reply. I have already found out where my mistake was. In the inference code, the function for loading LoRA has a parameter called “lora_alpha”, with a default value of 1.(line 373 in https://github.com/modelscope/DiffSynth-Studio/blob/main/diffsynth/models/model_manager.py)I originally thought that the lora_alpha here was the same as the lora_alpha in the PEFT training parameters, so I changed the “lora_alpha” to 32 (the parameter used when training the LoRA with PEFT in Diffusion-pipe). And at this point, I got bad results. However, in reality, these are not the same parameter. To be precise, the "lora_alpha" here actually represents the scaling factor during LoRA training. If you're using PEFT to train LoRA, it effectively corresponds to (lora_alpha/lora_rank). In summary, the code in this repository is ok. For me, I simply needed to set "lora_alpha" to 1 for proper operation (since the Diffusion-pipe repository uses the same value for both lora_alpha and lora_rank during LoRA training).

wine-plum avatar Apr 29 '25 03:04 wine-plum

@qiwang1996 Hi, I want to know how to do that (using diffusion-pipe to train lora of DiffSynth-Studio wan models)? Is there any tutorial?你好,我想知道如何做那件事(使用 diffusion-pipe 训练 DiffSynth-Studio wan 模型的 lora)?有没有教程?

Hi, I'm not entirely sure if I fully understand your perspective, but I can share my approach here. I'm training a LoRA for the WAN model using Diffusion-pipe, then running inference with DiffSynth-Studio. During inference, simply insert one line to load the LoRA weights after initializing the base WAN model. model_manager.load_lora("/path/adapter_model.safetensors", lora_alpha=1) Here’s how to implement it (as shown in the example image below):

Image

wine-plum avatar Apr 29 '25 04:04 wine-plum

lora trained by diffusion-pipe is not compatible with inference code of diffsynth-studio. you need to conduct key mappings of model weights by yourself to make it work

---Original--- From: @.> Date: Tue, Apr 29, 2025 12:02 PM To: @.>; Cc: @.@.>; Subject: Re: [modelscope/DiffSynth-Studio] Does anyone train a working Wan t2vlora from the t2v script of this repo? (Issue #367)

wine-plum left a comment (modelscope/DiffSynth-Studio#367)

@qiwang1996 Hi, I want to know how to do that (using diffusion-pipe to train lora of DiffSynth-Studio wan models)? Is there any tutorial?你好,我想知道如何做那件事(使用 diffusion-pipe 训练 DiffSynth-Studio wan 模型的 lora)?有没有教程?

Hi, I'm not entirely sure if I fully understand your perspective, but I can share my approach here. I'm training a LoRA for the WAN model using Diffusion-pipe, then running inference with DiffSynth-Studio. During inference, simply insert one line to load the LoRA weights after initializing the base WAN model. model_manager.load_lora("/path/adapter_model.safetensors", lora_alpha=1) Here’s how to implement it (as shown in the example image below):

image.png (view on web)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

qiwang1996 avatar Apr 29 '25 04:04 qiwang1996