About decode_first_stage in sampling
I 've noticed that 'decode_to_img' function in taming-transformer and vq-vae using decode_code or get_codebook_entry, but in ldm, decode_first_stage is quantize -> decode if not set predict_cid = True, why is this? What is the difference between quantize->decode and get_codebook_entry->decode?
In ldm paper, the author mentioned that
This model can be interpreted as a VQGAN [23] but with the quantization layer absorbed by the decoder.
I'm not really sure about this but maybe operating quantization method is little different with VQGAN I guess.
I think it's because there is a VQModelInterface wrapper, and the decode func there performs the codebook lookup before final decoding.