generate images with arbitrary resolutions
Is it true that a VAE model can only generate images with the same dimensions as the training data? For example, if the model was trained on 256x256 images, is there any way to use a checkpoint from that model to generate images with arbitrary resolutions, such as 352x275?
Is it true that a VAE model can only generate images with the same dimensions as the training data? For example, if the model was trained on 256x256 images, is there any way to use a checkpoint from that model to generate images with arbitrary resolutions, such as 352x275?
@Leiii-Cao In fact, this is not the case, and we will soon release work on T2I based on VAR to support arbitrary resolution generation.
Also, VAE is a CNN structure, so it can be reconstructed at any resolution
@Leiii-Cao Powered by a CNN structure, VAE could encode and decode images with arbitrary resolution images. However, VAR only generates square images. Our recent work Infinity (text-to-image model for VAR) could generates images with various aspect ratios. Please check https://github.com/FoundationVision/Infinity