VAE training code
請問你們有計畫release 3D VAE 的 training code嗎? 或者可以直接用你們目前diffuser 版的training code 然後fix住 transformer,改成直接train VAE嗎?
+1, english please! I would be interested in the training code of the 3D Causal VAE :)
We do not provide the code for training VAE separately. Thank you for your understanding. If you are aiming to improve the generation effect of the model, VAE is responsible for the reconstruction part, which has less impact on the model's effect than transformers fine-tuning.
想問目前從VAE 的code 中看到好像8 * N張或8 * N + 1張好像都可以encode跟decode對嗎
能
想再請問一下,如果想要用pose、depth、optical flow等video當作condition,會建議多train一個vae 嗎,還是其實用pretrain vae 再finetune transformer 就好了。
Can you just disclose how you train the VAE? Like how you implemented context parallel training on 161 frames. Is the 3D VAE able to encode longer sequences as well?
@zRzRzRzRzRzRzR +1, Thanks!
Thank you all for the amazing work on CogVideo and for making it public! +1. I ran many tests using the VAE, and the reconstruction capability is not good enough for my work. based on my experimentation, the spatial compression 8x is the main reason, so I would like to train it using a different compression.