How to fully fine-tune CogVideoX1.5-5B-I2V?
Can I modify the examples/cogvideo/train_cogvideox_image_to_video_lora.py in diffusers into the fully fine-tuned one? Since the training script sat/finetune_multi_gpus.sh has some bug.
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
Does the diffusers version refer to train_cogvideox_image_to_video_lora.py? I ran it according to the instructions here, but encountered the following error:
[rank7]: Traceback (most recent call last):
[rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1620, in <module>
[rank7]: main(args)
[rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1428, in main
[rank7]: model_output = transformer(
[rank7]: ^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward
[rank7]: else self._run_ddp_forward(*inputs, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward
[rank7]: return self.module(*inputs, **kwargs) # type: ignore[index]
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward
[rank7]: return model_forward(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__
[rank7]: return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank7]: return func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 470, in forward
[rank7]: ofs_emb = self.ofs_proj(ofs)
[rank7]: ^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 928, in forward
[rank7]: t_emb = get_timestep_embedding(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 54, in get_timestep_embedding
[rank7]: assert len(timesteps.shape) == 1, "Timesteps should be a 1d-array"
[rank7]: ^^^^^^^^^^^^^^^
[rank7]: AttributeError: 'NoneType' object has no attribute 'shape'
The training script is as follows (Note that the pretrained_model_name_or_path is CogVideoX1.5-5B-I2V):
#!/bin/bash
clear
GPU_IDS="0,1,2,3,4,5,6,7"
accelerate launch --gpu_ids $GPU_IDS train_cogvideox_image_to_video_lora.py \
--pretrained_model_name_or_path ./pretrained_weights/CogVideoX1.5-5B-I2V \
--cache_dir ./cache \
--instance_data_root ./storyboard_data_for_cog \
--caption_colum prompts_cogvlm2_debug.txt \
--video_column videos_debug.txt \
--validation_prompt "A woman is sitting in a basket under a tree. She is holding a pink flower and looking at it. The basket is made of straw and the woman is wearing a white dress. There are green leaves on the ground around her." \
--validation_images "a.jpg" \
--num_validation_videos 1 \
--validation_epochs 10 \
--seed 42 \
--rank 64 \
--lora_alpha 64 \
--mixed_precision fp16 \
--output_dir ./output-cogvideox-lora \
--height 480 --width 720 --fps 8 --max_num_frames 49 --skip_frames_start 0 --skip_frames_end 0 \
--train_batch_size 1 \
--num_train_epochs 30 \
--checkpointing_steps 1000 \
--gradient_accumulation_steps 1 \
--learning_rate 1e-3 \
--lr_scheduler cosine_with_restarts \
--lr_warmup_steps 200 \
--lr_num_cycles 1 \
--enable_slicing \
--enable_tiling \
--optimizer Adam \
--adam_beta1 0.9 \
--adam_beta2 0.95 \
--max_grad_norm 1.0
Need to check it out @zhipuch
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
你好,那目前的SAT版本显存76G,是因为这个是全部训练,diffusers是lora ft是?
diffusers版本的lora 和sft都有提供,cogvideox factory仓库提供了两种方案
diffusers版本的lora 和sft都有提供,cogvideox factory仓库提供了两种方案
ofs是干啥的,楼主的报错应该是因为现在cogvideox factory里没有ofs输入
的lora 和sft都 感谢你的回答!我看过是t2v的,i2v目前是不是只有lora
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
Does the diffusers version refer to train_cogvideox_image_to_video_lora.py? I ran it according to the instructions here, but encountered the following error:
[rank7]: Traceback (most recent call last): [rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1620, in <module> [rank7]: main(args) [rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1428, in main [rank7]: model_output = transformer( [rank7]: ^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward [rank7]: else self._run_ddp_forward(*inputs, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward [rank7]: return self.module(*inputs, **kwargs) # type: ignore[index] [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward [rank7]: return model_forward(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__ [rank7]: return convert_to_fp32(self.model_forward(*args, **kwargs)) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast [rank7]: return func(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 470, in forward [rank7]: ofs_emb = self.ofs_proj(ofs) [rank7]: ^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 928, in forward [rank7]: t_emb = get_timestep_embedding( [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 54, in get_timestep_embedding [rank7]: assert len(timesteps.shape) == 1, "Timesteps should be a 1d-array" [rank7]: ^^^^^^^^^^^^^^^ [rank7]: AttributeError: 'NoneType' object has no attribute 'shape'The training script is as follows (Note that the
pretrained_model_name_or_pathis CogVideoX1.5-5B-I2V):#!/bin/bash clear GPU_IDS="0,1,2,3,4,5,6,7" accelerate launch --gpu_ids $GPU_IDS train_cogvideox_image_to_video_lora.py \ --pretrained_model_name_or_path ./pretrained_weights/CogVideoX1.5-5B-I2V \ --cache_dir ./cache \ --instance_data_root ./storyboard_data_for_cog \ --caption_colum prompts_cogvlm2_debug.txt \ --video_column videos_debug.txt \ --validation_prompt "A woman is sitting in a basket under a tree. She is holding a pink flower and looking at it. The basket is made of straw and the woman is wearing a white dress. There are green leaves on the ground around her." \ --validation_images "a.jpg" \ --num_validation_videos 1 \ --validation_epochs 10 \ --seed 42 \ --rank 64 \ --lora_alpha 64 \ --mixed_precision fp16 \ --output_dir ./output-cogvideox-lora \ --height 480 --width 720 --fps 8 --max_num_frames 49 --skip_frames_start 0 --skip_frames_end 0 \ --train_batch_size 1 \ --num_train_epochs 30 \ --checkpointing_steps 1000 \ --gradient_accumulation_steps 1 \ --learning_rate 1e-3 \ --lr_scheduler cosine_with_restarts \ --lr_warmup_steps 200 \ --lr_num_cycles 1 \ --enable_slicing \ --enable_tiling \ --optimizer Adam \ --adam_beta1 0.9 \ --adam_beta2 0.95 \ --max_grad_norm 1.0
你好,请问你这个问题解决了?
We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds.
Does the diffusers version refer to train_cogvideox_image_to_video_lora.py? I ran it according to the instructions here, but encountered the following error:
[rank7]: Traceback (most recent call last): [rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1620, in <module> [rank7]: main(args) [rank7]: File "/suqinzs/jwargrave/CogVideo-305/sat/train_cogvideox_image_to_video_lora.py", line 1428, in main [rank7]: model_output = transformer( [rank7]: ^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1643, in forward [rank7]: else self._run_ddp_forward(*inputs, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/parallel/distributed.py", line 1459, in _run_ddp_forward [rank7]: return self.module(*inputs, **kwargs) # type: ignore[index] [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 823, in forward [rank7]: return model_forward(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/accelerate/utils/operations.py", line 811, in __call__ [rank7]: return convert_to_fp32(self.model_forward(*args, **kwargs)) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast [rank7]: return func(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 470, in forward [rank7]: ofs_emb = self.ofs_proj(ofs) [rank7]: ^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/anaconda3/envs/zym-cog/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 928, in forward [rank7]: t_emb = get_timestep_embedding( [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/suqinzs/jwargrave/diffusers-4729/src/diffusers/models/embeddings.py", line 54, in get_timestep_embedding [rank7]: assert len(timesteps.shape) == 1, "Timesteps should be a 1d-array" [rank7]: ^^^^^^^^^^^^^^^ [rank7]: AttributeError: 'NoneType' object has no attribute 'shape'The training script is as follows (Note that the
pretrained_model_name_or_pathis CogVideoX1.5-5B-I2V):#!/bin/bash clear GPU_IDS="0,1,2,3,4,5,6,7" accelerate launch --gpu_ids $GPU_IDS train_cogvideox_image_to_video_lora.py \ --pretrained_model_name_or_path ./pretrained_weights/CogVideoX1.5-5B-I2V \ --cache_dir ./cache \ --instance_data_root ./storyboard_data_for_cog \ --caption_colum prompts_cogvlm2_debug.txt \ --video_column videos_debug.txt \ --validation_prompt "A woman is sitting in a basket under a tree. She is holding a pink flower and looking at it. The basket is made of straw and the woman is wearing a white dress. There are green leaves on the ground around her." \ --validation_images "a.jpg" \ --num_validation_videos 1 \ --validation_epochs 10 \ --seed 42 \ --rank 64 \ --lora_alpha 64 \ --mixed_precision fp16 \ --output_dir ./output-cogvideox-lora \ --height 480 --width 720 --fps 8 --max_num_frames 49 --skip_frames_start 0 --skip_frames_end 0 \ --train_batch_size 1 \ --num_train_epochs 30 \ --checkpointing_steps 1000 \ --gradient_accumulation_steps 1 \ --learning_rate 1e-3 \ --lr_scheduler cosine_with_restarts \ --lr_warmup_steps 200 \ --lr_num_cycles 1 \ --enable_slicing \ --enable_tiling \ --optimizer Adam \ --adam_beta1 0.9 \ --adam_beta2 0.95 \ --max_grad_norm 1.0你好,请问你这个问题解决了?
No
我看了下,这个好像是代码少传递了个值。类似svd中的motion bucket id等。这块我猜测可能是视频的不同光流值。 查看了之前的code,非1.5的diffusers中也没有ofs这个值。期待他们说明下这个bug的原因。
看pipeline的推理代码 ofs_emb = None if self.transformer.config.ofs_embed_dim is None else latents.new_full((1,), fill_value=2.0) 似乎推理时ofs是一个固定值?训练时按照这个设定可以跑通,但是不确定实际训练的时候是否会根据视频运动幅度传不同值
看pipeline的推理代码 ofs_emb = None if self.transformer.config.ofs_embed_dim is None else latents.new_full((1,), fill_value=2.0) 似乎推理时ofs是一个固定值?训练时按照这个设定可以跑通,但是不确定实际训练的时候是否会根据视频运动幅度传不同值
厉害的,我这块改了。暂时没报错
不过在train_image_to_video_sft.sh中 uncompiled_1.yaml -->改为deepspeed.yaml一直报错,你这块有试过?
看pipeline的推理代码 ofs_emb = None if self.transformer.config.ofs_embed_dim is None else latents.new_full((1,), fill_value=2.0) 似乎推理时ofs是一个固定值?训练时按照这个设定可以跑通,但是不确定实际训练的时候是否会根据视频运动幅度传不同值
厉害的,我这块改了。暂时没报错
不过在train_image_to_video_sft.sh中 uncompiled_1.yaml -->改为deepspeed.yaml一直报错,你这块有试过?
我把ofs按你的改了之后没报错了,但是报了下面的错,我改了deepspeed没报错:
[rank7]: Traceback (most recent call last):
[rank7]: File "/suqinzs/jwargrave/cogvideox-factory-41/training/cogvideox_image_to_video_full.py", line 1033, in <module>
[rank7]: main(args)
[rank7]: File "/suqinzs/jwargrave/cogvideox-factory-41/training/cogvideox_image_to_video_full.py", line 826, in main
[rank7]: model_output = transformer(
[rank7]: ^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/deepspeed/utils/nvtx.py", line 18, in wrapped_fn
[rank7]: ret_val = func(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/deepspeed/runtime/engine.py", line 1899, in forward
[rank7]: loss = self.module(*inputs, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4739/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 476, in forward
[rank7]: hidden_states = self.patch_embed(encoder_hidden_states, hidden_states)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]: return self._call_impl(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/anaconda3/envs/zym-cogf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]: return forward_call(*args, **kwargs)
[rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank7]: File "/suqinzs/jwargrave/diffusers-4739/src/diffusers/models/embeddings.py", line 431, in forward
[rank7]: image_embeds = image_embeds.reshape(
[rank7]: ^^^^^^^^^^^^^^^^^^^^^
[rank7]: RuntimeError: shape '[1, 6, 2, 30, 2, 45, 2, 32]' is invalid for input of size 2246400
那个最大图片数据给成53;因为这块要被2整除。 能问下你的训练step是几s?我这边8-14s有点慢
@zRzRzRzRzRzRzR How should the value of ofs_emb be set during training? No relevant description was found in the paper.
Setting mixed_precision=fp16 will lead to the loss equals nan, do you also meet this problem?
**lijain ** commented Nov 27, 2024 • 请问是指 max_num_frames改成53吗
Is there a solution to this issue to successfully fine-tune CogVideoX1.5-5B-I2V?
Is there a solution to this issue to successfully fine-tune CogVideoX1.5-5B-I2V?
https://github.com/a-r-r-o-w/finetrainers
**lijain ** commented Nov 27, 2024 • 请问是指 max_num_frames改成53吗 Yes, just make (x-1)/4+1 an even number
do we have CogVideoX-5B-I2V finetune code?(Lora)
do we have CogVideoX-5B-I2V finetune code?(Lora)
https://github.com/a-r-r-o-w/finetrainers
do we have CogVideoX-5B-I2V finetune code?(Lora)
https://github.com/a-r-r-o-w/finetrainers
but this is for text-video?
do we have CogVideoX-5B-I2V finetune code?(Lora)
https://github.com/a-r-r-o-w/finetrainers
but this is for text-video?
Check this dir https://github.com/a-r-r-o-w/finetrainers/tree/main/training or search I2V in the repo
do we have CogVideoX-5B-I2V finetune code?(Lora)
https://github.com/a-r-r-o-w/finetrainers
but this is for text-video?
Check this dir https://github.com/a-r-r-o-w/finetrainers/tree/main/training or search I2V in the repo
oh i see, thanks a lot
不过在train_image_to_video_sft.sh中 uncompiled_1.yaml -->改为deepspeed.yaml一直报错,你这块有试过?