VAR generate images with arbitrary resolutions

Is it true that a VAE model can only generate images with the same dimensions as the training data? For example, if the model was trained on 256x256 images, is there any way to use a checkpoint from that model to generate images with arbitrary resolutions, such as 352x275?

Sep 01 '24 12:09 Leiii-Cao

Is it true that a VAE model can only generate images with the same dimensions as the training data? For example, if the model was trained on 256x256 images, is there any way to use a checkpoint from that model to generate images with arbitrary resolutions, such as 352x275?

@Leiii-Cao In fact, this is not the case, and we will soon release work on T2I based on VAR to support arbitrary resolution generation.

Also, VAE is a CNN structure, so it can be reconstructed at any resolution

Nov 29 '24 08:11 enjoyyi00

@Leiii-Cao Powered by a CNN structure, VAE could encode and decode images with arbitrary resolution images. However, VAR only generates square images. Our recent work Infinity (text-to-image model for VAR) could generates images with various aspect ratios. Please check https://github.com/FoundationVision/Infinity

Dec 13 '24 09:12 JeyesHan