Wang Duomin
Wang Duomin
just like what i ask in title, in paper author said "gradients can be propagated backward from the output features to the input features, but not to the input proposal...
so in the whole architecture, no explicit codebook vecters are used right? only categorical logits as the input to the decoder when you train your dvae?
Hi, Thanks for your awsome work! I am now confused in the generation of uv_weight_mask. Did you generate it using uv_kpt_ind.txt? And is uv_kpt_in.txt generated from Model_keypoints.mat? How can i...
just like the picture shows, some end seconds are negative, is there something wrong? besides i found some annotations has the same start_sec and end_sec, maybe it's also a mistake
does the beta weighted wrong loss term of embedding loss which should be commitment loss in the vanilla va-vae?
Hi, this is a good work! Can you explain the process of rasteriazation for T_uv in more detail?
 如图所示,直接使用3d vae重建sora的example,会发现结果是64*64的patch组成的,重建512*512的视频会有8*8个patch,1024*1024的视频会有16*16个patch。我找遍了code也没有发现哪里有patch的构建,64*64的patch对应到latent上应该是8*8个latent为一组进行处理,可代码中并没有这个操作。