latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

About decode_first_stage in sampling

Open LoveU3tHousand2 opened this issue 3 years ago • 2 comments

I 've noticed that 'decode_to_img' function in taming-transformer and vq-vae using decode_code or get_codebook_entry, but in ldm, decode_first_stage is quantize -> decode if not set predict_cid = True, why is this? What is the difference between quantize->decode and get_codebook_entry->decode?

LoveU3tHousand2 avatar Jan 13 '23 02:01 LoveU3tHousand2

In ldm paper, the author mentioned that

This model can be interpreted as a VQGAN [23] but with the quantization layer absorbed by the decoder.

I'm not really sure about this but maybe operating quantization method is little different with VQGAN I guess.

Yoonho-Na avatar Jan 30 '23 00:01 Yoonho-Na

I think it's because there is a VQModelInterface wrapper, and the decode func there performs the codebook lookup before final decoding.

ryx19th avatar Jul 11 '24 20:07 ryx19th