Feature maps from intermediate layer
Hello,
Are there any best practices or guidelines for selecting feature maps from the intermediate layers (out of the 12 available)? Specifically, what do these feature maps represent, and in what scenarios should they be used?
For example, in downstream training with NAIP (RGBN) data, feature maps from layers 3, 5, 7, and 11 were utilized. For RGB data, would layers 3, 5, and 7 be appropriate? Why not layers 2, 6, or 8, for instance?
Cheers.
@patriksabol There isn't a specific meaning assigned to each layer, but typically, earlier layers tend to capture simpler features like lines, edges, and basic shapes, while later layers identify more complex structures like field boundaries and road networks. When working on segmentation-style modeling, it's beneficial to select a balanced mix of both simple and complex features. You might consider using layers like 1, 3, 7, and 9 or 2, 4, 6, and 8. For a deeper understanding, I recommend checking out this paper: https://arxiv.org/pdf/2212.06727.
Since this tickets was opened, @srmsoumya reports that we have since simplified the decoder, we take the last layer output at scale of 32 x 32 and then upscale it using convolution blocks in the decoder. See https://github.com/Clay-foundation/model/blob/main/finetune/segment/factory.py and documentation for v1.5.
Do we have to rewrite the class SegmentEncoder section in factory.py for feature_maps to work correctly when training the model? Because it seemed like this was a hidden portion of the code, but maybe it has changed since. I will be more specific because I believe this was the model used:
(8): ConvTranspose2d(16, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
to upscale.