centerformer icon indicating copy to clipboard operation
centerformer copied to clipboard

Question about why the add&norm structure of the tranformer network differ from the typical transformer one

Open Liaoqing-up opened this issue 2 years ago • 3 comments

https://github.com/TuSimple/centerformer/blob/96aa37503dc900d1aebeb7c1086c33bbd0c01d26/det3d/models/utils/transformer.py#L267-L279 In the code, the residual in transformer is only the input after add and does not pass through the norm layer. add and norm are not taken as a whole, which is different from the typical transformer structure (the result of add and norm in series as a new level of input). Is there any special consideration for the design here?

Liaoqing-up avatar Feb 19 '23 14:02 Liaoqing-up

I used prenorm inside each layer. https://github.com/TuSimple/centerformer/blob/96aa37503dc900d1aebeb7c1086c33bbd0c01d26/det3d/models/utils/transformer.py#L218-L238

edwardzhou130 avatar Feb 19 '23 23:02 edwardzhou130

I used prenorm inside each layer.

https://github.com/TuSimple/centerformer/blob/96aa37503dc900d1aebeb7c1086c33bbd0c01d26/det3d/models/utils/transformer.py#L218-L238

I see, but I wonder if you have tried Add&Norm after each layer, which means the residual skip connect input are the features already passed through the Norm. Is it possible that the results of these two structures do not differ much?

Liaoqing-up avatar Feb 20 '23 01:02 Liaoqing-up

Sorry, I haven't tried Add&Norm after each layer. Do you have experience with this before and would the results be better if you used this implementation?

edwardzhou130 avatar Feb 22 '23 17:02 edwardzhou130