JianHan comments

Repositories
Issues
Comments

Results 4 comments of


                                            JianHan

The results are too low!!!

my result on 10-shot: AP for bird = 0.525 AP for bus = 0.158 AP for cow = 0.539 AP for motorbike = 0.454 AP for sofa = 0.029 Mean...

Image reconstruction via Transformer.

@minimini-1 @maggiesong7 In deed, the way you try to recons an image using VAR is incorrect. VAR formulates a next-scale prediciton task where **current scale prediciton is conditioned on previous...

generate images with arbitrary resolutions

@Leiii-Cao Powered by a CNN structure, VAE could encode and decode images with arbitrary resolution images. However, VAR only generates square images. Our recent work [Infinity](https://github.com/FoundationVision/Infinity) (text-to-image model for VAR)...

How can I change the scale for training?

It's OK to slightly change the scale schedule for vqvae since it adopts a CNN architecture. It could still encodes and decodes images normally but with a slight performance drop....