centerformer
centerformer copied to clipboard
Some details to discuss
Thank you for open-sourcing your work. I was wondering, why you use x_up(the current frame's bev feature) other than x_up_fuse(the sequential frames through spatial-aware fusion) as center query embedding ? Apologies if I missed it in the paper.
Sorry for the late reply. There is no empirical reason for me to choose x_up rather than x_up_fuse. Center classification and box regression need two different types of information from previous frames, so I want to avoid mixing them.