[Bug] fine-tuning wan2.1 all noise， but the generated videos during the validation phase remain unchanged regardless of the number of fine-tuning steps.

Open clytze0216 opened this issue 6 months ago • 3 comments

Describe the bug

I'm fine-tuning the WAN2.1 model with our data, but the generated videos during the validation phase remain unchanged regardless of the number of fine-tuning steps.

we use the scrips of FastVideo/scripts/finetune/finetune_v1_VSA.sh and FastVideo/scripts/finetune/finetune_v1.sh

but the the generated videos during the validation phase remain unchanged regardless of the number of fine-tuning steps, such as:

https://github.com/user-attachments/assets/b006a20b-095b-433d-b885-c802f1df591e

https://github.com/user-attachments/assets/66f4c664-b9fb-43de-bcaf-02813d896e0c

However, when I use the transformer from the saved checkpoint for inference, the results are normal. I'd like to ask if there might be an issue with the validation sampling or somewhere else? Do you have any suggestions for modifications?

Reproduction

FastVideo/scripts/finetune/finetune_v1_VSA.sh FastVideo/scripts/finetune/finetune_v1.sh

export WANDB_PROJECT=fastvideo-vsa export TRITON_CACHE_DIR=/tmp/triton_cache export FASTVIDEO_ATTENTION_BACKEND=VIDEO_SPARSE_ATTN DATA_DIR=FastVideo/FVchitect/Vchitect_T2V_DataVerse/split_new/all VALIDATION_DIR=//FastVideo/FVchitect/Vchitect_T2V_DataVerse/split_new/latents2_resume2_61/validation_parquet_dataset/worker_0

NUM_GPUS=4

CHECKPOINT_PATH="$DATA_DIR/outputs/wan_finetune_vsa/checkpoint-1200"

If you do not have 32 GPUs and to fit in memory, you can: 1. increase sp_size. 2. reduce num_latent_t

torchrun --nnodes 1 --nproc_per_node $NUM_GPUS
--master_port=25368
FastVideo/fastvideo/v1/training/wan_training_pipeline.py
--model_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
--inference_mode False
--pretrained_model_name_or_path Wan-AI/Wan2.1-T2V-1.3B-Diffusers
--cache_dir "/home/ray/.cache"
--data_path "$DATA_DIR"
--validation_preprocessed_path "$VALIDATION_DIR"
--train_batch_size 1
--num_latent_t 8
--sp_size 1
--tp_size 1
--num_gpus $NUM_GPUS
--hsdp_replicate_dim $NUM_GPUS
--hsdp-shard-dim 1
--train_sp_batch_size 1
--dataloader_num_workers 4
--gradient_accumulation_steps 1
--max_train_steps 50000
--learning_rate 1e-5
--mixed_precision "bf16"
--checkpointing_steps 100
--validation_steps 500
--validation_sampling_steps "2,4,8,50"
--log_validation
--checkpoints_total_limit 3
--allow_tf32
--ema_start_step 0
--cfg 0.0
--output_dir "$DATA_DIR/outputs/wan_finetune_vsa"
--tracker_project_name VSA_finetune
--num_height 448
--num_width 832
--num_frames 61
--flow_shift 3
--validation_guidance_scale "5.0"
--num_euler_timesteps 50
--master_weight_type "fp32"
--dit_precision "fp32"
--weight_decay 0.01
--max_grad_norm 1.0
--VSA_sparsity 0.9
--VSA_decay_rate 0.03
--VSA_decay_interval_steps 30 \

Environment

same as the fastvideo

Jul 14 '25 06:07 clytze0216

could you set --validation_sampling_steps to 50? This indicates the inference steps for validation

Jul 14 '25 07:07 BrianChen1129

你能设置--validation_sampling_steps为 50 吗？这表示验证的推理步骤

I have already tried "--validation_sampling_steps=50" ,but is does't work for vsa fine-tuning. Do you have any suggestions for modifications?

Jul 14 '25 11:07 clytze0216

how did you preprocess the dataset? Did you use the same model?

Aug 29 '25 04:08 SolitaryThinker