JWargrave

Results 9 comments of JWargrave

> May I know your training configs? Slightly enlarging the batch size or lowering the probability of putting in the `1080p` bucket should be able to help. But this is...

> ``` > The two number defined in the bucket config is (keep_prob, batch_size). Since the memory and speed of samples from different buckets may be different, we use batch_size...

> We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if it succeeds....

> > > We are doing fine-tuning work on the cogvideox-factory, please use the diffusers version for fine-tuning, as the sat version will run out of memory (OOM) even if...

> > 看pipeline的推理代码 ofs_emb = None if self.transformer.config.ofs_embed_dim is None else latents.new_full((1,), fill_value=2.0) 似乎推理时ofs是一个固定值?训练时按照这个设定可以跑通,但是不确定实际训练的时候是否会根据视频运动幅度传不同值 > > 厉害的,我这块改了。暂时没报错 ![企业微信截图_17326082775563](https://private-user-images.githubusercontent.com/18046568/389850353-d2cb97ad-cc7d-4d77-a549-65752b3e0a19.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MjM1OTMsIm5iZiI6MTczMjYyMzI5MywicGF0aCI6Ii8xODA0NjU2OC8zODk4NTAzNTMtZDJjYjk3YWQtY2M3ZC00ZDc3LWE1NDktNjU3NTJiM2UwYTE5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDEyMTQ1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI2MmVlOWQyOGJiMjE1NWZjNmY1NDJiYWQzODMxM2M4YmQ0YjA1N2U3YmQ0NmZmZTJlNDQ2ZWIzZDMzYmQ5MjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.u-XWWsAnXqb1rnoImcrplRhs2LtU0_CkfMK9CW-eIUg) 不过在train_image_to_video_sft.sh中 uncompiled_1.yaml -->改为deepspeed.yaml一直报错,你这块有试过? 我把ofs按你的改了之后没报错了,但是报了下面的错,我改了deepspeed没报错: ``` [rank7]: Traceback (most recent call last): [rank7]:...

会不会是因为我的视频都比较短?我的训练脚本如下,但是我的训练视频有些都不到49帧: ```shell clear current_time=$(date "+%Y-%m-%d_%H:%M:%S") export CUDA_HOME=/usr/local/cuda-12.1 # export CUDA_VISIBLE_DEVICES=0 export WANDB_MODE="offline" # export WANDB_MODE="online" # For training from scratch # export MODEL_PATH="THUDM/CogVideoX-5b-I2V" # export CONFIG_PATH="THUDM/CogVideoX-5b-I2V" # For finetune ConsisID...

> 感谢你的关注。有可能和数据相关,你的训练数据分布可能与ConsisID所用的训练数据数据分布差距太多。此外,或许有一下几个解决方案 #17 : 1、构建更高质量的数据集; 2、batch_size调大,learning_rate调小,换成训lora而不是全参微调; 3、加载"THUDM/CogVideoX-5b-I2V"权重而不是加载“ConsisID-preview”,完全从头重训一个IPT2V模型; 感谢!我试试!

> 感谢你的关注。有可能和数据相关,你的训练数据分布可能与ConsisID所用的训练数据数据分布差距太多。此外,或许有一下几个解决方案 #17 : 1、构建更高质量、数据量更大的数据集(不仅仅是视频,prompt的质量也要求较高); 2、batch_size调大,learning_rate调小,换成训lora而不是全参微调; 3、加载"THUDM/CogVideoX-5b-I2V"权重而不是加载“ConsisID-preview”,完全从头重训一个IPT2V模型; 训lora时在[这一行](https://github.com/PKU-YuanGroup/ConsisID/blob/153ae1e0be5791b6a171fba0b410b722f7e4fa6d/train.py#L635)遇到了`TypeError: LoraConfig.__init__() got an unexpected keyword argument 'exclude_modules'`的报错,请问peft应该安装哪个版本呢?我是根据[requirements.txt](https://github.com/PKU-YuanGroup/ConsisID/blob/153ae1e0be5791b6a171fba0b410b722f7e4fa6d/requirements.txt#L12)装的0.12.0

> > 感谢你的关注。有可能和数据相关,你的训练数据分布可能与ConsisID所用的训练数据数据分布差距太多。此外,或许有一下几个解决方案 #17 : 1、构建更高质量、数据量更大的数据集(不仅仅是视频,prompt的质量也要求较高); 2、batch_size调大,learning_rate调小,换成训lora而不是全参微调; 3、加载"THUDM/CogVideoX-5b-I2V"权重而不是加载“ConsisID-preview”,完全从头重训一个IPT2V模型; > > 训lora时在[这一行](https://github.com/PKU-YuanGroup/ConsisID/blob/153ae1e0be5791b6a171fba0b410b722f7e4fa6d/train.py#L635)遇到了`TypeError: LoraConfig.__init__() got an unexpected keyword argument 'exclude_modules'`的报错,请问peft应该安装哪个版本呢?我是根据[requirements.txt](https://github.com/PKU-YuanGroup/ConsisID/blob/153ae1e0be5791b6a171fba0b410b722f7e4fa6d/requirements.txt#L12)装的0.12.0 我知道了,我可能需要install from source