UnTrack
UnTrack copied to clipboard
something seems confusing in the code file vit_ce_prompt.py
in the method forward_featuresof class VisionTransformerCE, the following codes are kind of confusing for me:
x_rgb = x[:, :3, :, :]
z_rgb = z[:, :3, :, :]
x_dte = x[:, 3:6, ...]
z_dte = z[:, 3:6, ...]
x, z = x_rgb, z_rgb
z = self.patch_embed(z) # 32, 64, 768
x = self.patch_embed(x) # 32, 256, 768
z_rgb_4edge, max_rgb_z_edge = gradient(z_dte)
x_rgb_4edge, max_rgb_x_edge = gradient(x_dte)
z_4edge, max_z_edge = gradient(z_dte)
x_4edge, max_x_edge = gradient(x_dte)
i don't understand why in the process of enhancing rgb image's edge by calculating the gradient, it is gradient(z_dte) and gradient(x_dte) being executed rather than gradient(z_rgb) and gradient(x_rgb)
is it an uncareful mistake, or just because of my poor understanding?