about “3D dense captioning with ground truth bounding boxes”

Open Samchengjiaming opened this issue 3 years ago • 0 comments

hi~，I have a problem about using maskvotenet to get visual feature of GT bbox,In your code ,you just get One target object's feature,Do you konw how to get all GT bbox feature?of course，for Scan2Cap task,just need one target object's feature,but aboout visual grounding task,we need all GT bbox feature.thank you~

Nov 15 '22 07:11 Samchengjiaming