Zhifeng
Zhifeng
I have the same problem. How did you solve it in the end?
same, is it solved???
I also encountered a similar error. My solution was to switch to deepspeed2. I hope my suggestions can help you.
> > I also encountered a similar error. My solution was to switch to deepspeed2. I hope my suggestions can help you. > > [@zfw-cv](https://github.com/zfw-cv) Thank you, but using `deepspeed_stage_2`...
> model_manager.load_models([ "models/lightning_logs/version_2/checkpoints/epoch=9-step=5000.ckpt", "models/Wan-AI/Wan2.1-T2V-14B/models_t5_umt5-xxl-enc-bf16.pth", "models/Wan-AI/Wan2.1-T2V-14B/Wan2.1_VAE.pth", ]) When training your own CKPT and following the tutorial to run test.by, Loading models from: models/lightning_logs/version_2/checkpoints/epoch=9-step=5000.ckpt We cannot detect the model type. No models...
> I use deepspeed to train i2v-14b model, but only optimizer is saved, I cannot find any model file. > > Hello, I also encountered a similar problem. I trained...
> after full training, i use zero_to_fp32.py convert *.pt to *.safetensors, when inference, i load the model, I also not report the log: No wan_video_dit models available. We cannot detect...
> I would like to kindly follow up another question, does data_processing also only allow for batch_size=1? Currently I find data processing process seems does not support multi-gpu, so the...
> I use --use_gradient_checkpointing 我使用 --use_gradient_checkpointing --use_gradient_checkpointing_offload --training_strategy "deepspeed_stage_2" and can full fine-tune--training_strategy “deepspeed_stage_2” 并且可以完全微调 Hello, I also implemented lora and full training based on deepspeed_stage_2. But I found that...