yangzy_thu
yangzy_thu
@ShaoNeilz are you solved it? I have the same trouble.
@xiaocaijizzz More detailed documentation is here : 'https://sleepychord.github.io/cogdata/build/html/index.html'. For example, you can use '--data_format TarDataset --data_files path_to_your_tar', or '--data_format ZipDataset --data_files path_to_your_zip' while creating dataset. Images in zip are like...
We follow magvit-v2 (https://arxiv.org/html/2310.05737v2). 4x+1 enable joint training with images and videos
This part means the model parallel in transformers and the context parallel in VAE use the same communication group. The current open-source code transformer part does not support context parallel.
The max training length
CogVideoX use the same noise level as stable video diffusion. Dynamics won't be a problem
```python from openai import OpenAI prefix =''' **Objective**: **Give a highly descriptive video caption based on input image and user input. **. As an expert, delve deep into the image...
We used approximately 1/10 of the gpu hours for i2v fine-tuning, but a similar performance can be achieved with less GPU hours
Temporal compression by 8x can result in significant ghosting artifacts, which cannot be reflected in the evaluation metrics.
先确认下数据读取没有卡住训练?