VAN-Classification
VAN-Classification copied to clipboard
Attention vs Add in LKA
In table 3, changing attention (mul) to add reduces VAN performance from 75.4 to 74.6. I think this is really huge. However, in the ablation study, you stated that "Besides, replacing attention with adding operation is also not achieving a lower accuracy". Is it okay to say it like that since the performance drop is 0.8
Can't treat add as a type of attention function? In Attention Mechanisms in Computer Vision: A Survey, we have the formula:
I can treat function f here is an addition operation can't I?
@MenghaoGuo Hello can you explain this?