something seems confusing in the code file vit_ce_prompt.py

Open RiverUp opened this issue 11 months ago • 0 comments

in the method forward_featuresof class VisionTransformerCE, the following codes are kind of confusing for me:

        x_rgb = x[:, :3, :, :]
        z_rgb = z[:, :3, :, :]

        x_dte = x[:, 3:6, ...]
        z_dte = z[:, 3:6, ...]

        x, z = x_rgb, z_rgb
        z = self.patch_embed(z)  # 32, 64, 768
        x = self.patch_embed(x)  # 32, 256, 768

        z_rgb_4edge, max_rgb_z_edge = gradient(z_dte)
        x_rgb_4edge, max_rgb_x_edge = gradient(x_dte)

        z_4edge, max_z_edge = gradient(z_dte)
        x_4edge, max_x_edge = gradient(x_dte)

i don't understand why in the process of enhancing rgb image's edge by calculating the gradient, it is gradient(z_dte) and gradient(x_dte) being executed rather than gradient(z_rgb) and gradient(x_rgb) is it an uncareful mistake, or just because of my poor understanding?

Mar 15 '25 13:03 RiverUp