yangzy_thu comments

Results 19 comments of


                                            yangzy_thu

invalid device ordinal

@ShaoNeilz are you solved it? I have the same trouble.

@xiaocaijizzz More detailed documentation is here : 'https://sleepychord.github.io/cogdata/build/html/index.html'. For example, you can use '--data_format TarDataset --data_files path_to_your_tar', or '--data_format ZipDataset --data_files path_to_your_zip' while creating dataset. Images in zip are like...

Reason for 49 frames (extra split for interpolation)

We follow magvit-v2 (https://arxiv.org/html/2310.05737v2). 4x+1 enable joint training with images and videos

On the parallel setting

This part means the model parallel in transformers and the context parallel in VAE use the same communication group. The current open-source code transformer part does not support context parallel.

Theoritically, what limits the number of frames the model can inferrence?

The max training length

Finetune img2video based on T2V model

CogVideoX use the same noise level as stable video diffusion. Dynamics won't be a problem

Caption Upsampler codes for image-to-video

```python from openai import OpenAI prefix =''' **Objective**: **Give a highly descriptive video caption based on input image and user input. **. As an expert, delve deep into the image...

Training details on image to video.

We used approximately 1/10 of the gpu hours for i2v fine-tuning, but a similar performance can be achieved with less GPU hours

Why not use higher compression ratio in VAE?

Temporal compression by 8x can result in significant ghosting artifacts, which cannot be reflected in the evaluation metrics.

Finetune时GPU利用率波动很大

先确认下数据读取没有卡住训练？