XPretrain icon indicating copy to clipboard operation
XPretrain copied to clipboard

About LF-VILA code in PatchEmbed3D of video encoder

Open musicman217 opened this issue 1 year ago • 0 comments

the padding seems not right, or maybe i made a mistake

# padding
        _, _, D, H, W = x.size() 
        if H % self.patch_size[0] != 0: 
            x = F.pad(x, (0, 0, 0, self.patch_size[1] - H % self.patch_size[1]))
        if W % self.patch_size[1] != 0:
            x = F.pad(x, (0, 0, 0, 0, 0, self.patch_size[0] - D % self.patch_size[0]))

owing to patch_size=[1, 8, 8] where 8x8 is HxW in implementation, should it be padded in H and W dimension? condition H % self.patch_size[0] != 0 and W % self.patch_size[1] != 0 make me lost thanks a lot!

musicman217 avatar Mar 23 '24 02:03 musicman217