is it make sense to try image loss in stage 2?
I tried to use gumbel-softmax to transfer latent prediction to image for several loss(like perceptual lossćl1ćadv , tried aims to achieve several tasks that crossentropyloss might not suit well. ) in stage2(transformer train period), all of them seemed not work. I wonder if my thoughts was wrong. Thanks for ur excellent work!!
also i find sometimes used gumbel-softmax(hard=false) will show a better result for train,but it is a bad set for use hard=false? if codebook is limited, will mixed token show more performance ? my ability doesn't support my question and i could'n find useful research .Thanks for ur excellent work again!!
also i find sometimes used gumbel-softmax(hard=false) will show a better result for train,but it is a bad set for use hard=false? if codebook is limited, will mixed token show more performance ? my ability doesn't support my question and i could'n find useful research .Thanks for ur excellent work again!!
@AlexzQQQ Have you ever tried using only two predicted probability within one token to mix for each latent embedding? I mean, just mask out the non-top-2 probability and use gumbel-softmax on the predicted top-2 probability to produce latent embedding and decode it using VQ-VAE. This setup simplifies your first question whether the performance drop because of mixing a massive amount of codes. As your second question, what's your evaulation result when using gumbel-softmax(hard=false)? And what's your temperature setup? Using gumbel-softmax(hard=false) will incorporate multiple token probability when producing a latent embedding, and it's hard to tell and analyze if it's a good move without any evaluation result.
I think although it's adventageous to mix token probability given each code represent distinct feature in theory, whether the pretrained VQ-VAE can fully utilize the rich latent repretentation produce by the mixed token probability or it will simply collapse eventually in the 2nd stage of training is worth exploring.
@jack111331 thx for u reply , I will try what your questioned in serveral weeks due to busy work. the mix token will useful if loss is not only crossentropy loss in my opinion.
@AlexzQQQ @jack111331 I have some practical questions. How to add image loss in stage two? Should the image loss be applied to the output image after performing 'idx = gumbel_softmax(logits) and image = vqvae.idxBl_to_img(idx)'? I displayed the output image and found that the quality is very poor. Did I do something wrong? Thx for the relpy very much!
@Longhzzz I use gumbel_softmax for different scales token that mix into the final 16*16 token map(which then through frozen decoder into img ,then use img to apply image loss. I think its useless to try in these part:) hope my reply could help u