Monocular-Depth-Estimation-Toolbox icon indicating copy to clipboard operation
Monocular-Depth-Estimation-Toolbox copied to clipboard

Conceptual Questions about BinsFormer

Open onoyuki081 opened this issue 3 years ago • 1 comments

Thanks for a great repo! I have a question about BinsFormer architecture especially about 'auxiliary classification.' From your code, I found that there are variables called class_num and n_bins. In my initial understanding from AdaBins is correct, I think AdaBins classifies the network output by n_bins, but it seems that BinsFormer's classification is done by class_num, where class here represents specific type of indoor scene (like bedroom, bathroom) in nyu. But if so, why do we need such class in BinsFormer and how class contributes to 'auxiliary classification'? If you need more clarification, I appreciate if you let me know.

onoyuki081 avatar Aug 29 '22 06:08 onoyuki081

I'm now revising this paper. Thanks for your questions. I may improve my description of the 'auxiliary classification' in this version.

Let me try to answer you more directly. Considering the bins queries b = b1, b2... bn, and a classification query c. We have n_bins(Adabins) bins queries and only one classification query c. Each bin query will be projected to the bin length and associated with its rank in b. The classification query c will be projected to the classification logit via MLP, which is used to classify the environment like bedroom, bathroom, ..., and so on.

The classification query is included in the self-attention of queries so that the environment information can be passed into bins queries. Hence, we claim it can further improve the model performance, which is proven in our ablation studies. Also, this supervision is more effective compared to the time-consuming chamfer loss used in Adabins.

zhyever avatar Sep 08 '22 12:09 zhyever

Close it for now.

zhyever avatar Oct 01 '22 07:10 zhyever