VAR icon indicating copy to clipboard operation
VAR copied to clipboard

generate images with arbitrary resolutions

Open Leiii-Cao opened this issue 1 year ago • 2 comments

Is it true that a VAE model can only generate images with the same dimensions as the training data? For example, if the model was trained on 256x256 images, is there any way to use a checkpoint from that model to generate images with arbitrary resolutions, such as 352x275?

Leiii-Cao avatar Sep 01 '24 12:09 Leiii-Cao

Is it true that a VAE model can only generate images with the same dimensions as the training data? For example, if the model was trained on 256x256 images, is there any way to use a checkpoint from that model to generate images with arbitrary resolutions, such as 352x275?

@Leiii-Cao In fact, this is not the case, and we will soon release work on T2I based on VAR to support arbitrary resolution generation.

Also, VAE is a CNN structure, so it can be reconstructed at any resolution

enjoyyi00 avatar Nov 29 '24 08:11 enjoyyi00

@Leiii-Cao Powered by a CNN structure, VAE could encode and decode images with arbitrary resolution images. However, VAR only generates square images. Our recent work Infinity (text-to-image model for VAR) could generates images with various aspect ratios. Please check https://github.com/FoundationVision/Infinity

JeyesHan avatar Dec 13 '24 09:12 JeyesHan