CogVideo VAE training code

請問你們有計畫release 3D VAE 的 training code嗎? 或者可以直接用你們目前diffuser 版的training code 然後fix住 transformer，改成直接train VAE嗎?

Jan 08 '25 08:01 cdfan0627

+1, english please! I would be interested in the training code of the 3D Causal VAE :)

Jan 08 '25 10:01 samuruph

We do not provide the code for training VAE separately. Thank you for your understanding. If you are aiming to improve the generation effect of the model, VAE is responsible for the reconstruction part, which has less impact on the model's effect than transformers fine-tuning.

Jan 09 '25 03:01 zRzRzRzRzRzRzR

想問目前從VAE 的code 中看到好像8 * N張或8 * N + 1張好像都可以encode跟decode對嗎

Jan 09 '25 03:01 cdfan0627

能

Jan 09 '25 04:01 zRzRzRzRzRzRzR

想再請問一下，如果想要用pose、depth、optical flow等video當作condition，會建議多train一個vae 嗎，還是其實用pretrain vae 再finetune transformer 就好了。

Jan 09 '25 06:01 cdfan0627

Can you just disclose how you train the VAE? Like how you implemented context parallel training on 161 frames. Is the 3D VAE able to encode longer sequences as well?

Jan 16 '25 10:01 samuruph

@zRzRzRzRzRzRzR +1, Thanks!

Jan 17 '25 00:01 jjihwan

Thank you all for the amazing work on CogVideo and for making it public! +1. I ran many tests using the VAE, and the reconstruction capability is not good enough for my work. based on my experimentation, the spatial compression 8x is the main reason, so I would like to train it using a different compression.

Jan 17 '25 23:01 Gabriellgpc