zzwei1
zzwei1
Hi, I wonna use TSM on my own dataset, which is a video-like input(each gesture have 32 frames,so my input shape is (N,C,T,H,W)). But when I use a 2D conv...
Nice work ! I wonna know if it's possible to extend this deformable kernel to a 3D version to inpletement it to action recgnition? And what should I do? Thanks...
Hi, nice work! I'm interesting about your another work "Fourier Space Losses for Efficient Perceptual Image Super-Resolution". I wonna learn your code, however, I cannot jump to a correct url...
Nice work! I wonna use context-gated-convolution in action recgnition. I noticed that you use torch.nn.Unfold in layer.py, and it needs a 4-D input(batch x channel x height x width). But...
Thanks for the repo. I don't understand the difference between DeformConv_d and DeformConvPack_d. I found the main difference in the source code (deform_conv.py) is that, in DeformConv_d, offset = temp.clone().resize_(b,...
Hi,nice work! I'm confusing about visualizing the learned offsets as shown in Fig. 6? Could you give some reference code?
Nice work ! I notice that you pretrain a VQ-VAE to compress the image sequence to a discrete latent space, and explore an auto-regressive decoder named Earthformer-AR. I'm interesting in...
Nice work! However, I'm a bit confused about the "many to one " training style. Does it mean that, in such a video reconstruction network, you have N reconstructions that...