TransVG icon indicating copy to clipboard operation
TransVG copied to clipboard

Input of Linguistic Branch

Open JJ-res101 opened this issue 4 years ago • 3 comments

Thank you for your excellent work! How does the model get the box of a certain phrase in a sentence? Right now it seems to me that the model can't do that. Is that right?

JJ-res101 avatar Nov 19 '21 07:11 JJ-res101

The box is not annotated to match a certain phrase, but the whole sentence.

djiajunustc avatar Nov 20 '21 16:11 djiajunustc

I think the box is annotated to each phrase in Flickr30K Entities data. As said in your paper, "Flickr30K Entities [38] augments the original Flickr30K [58] with short region phrase correspondence annotations." Maybe the 'Flickr' dataset you use is one box annotation per sentence. Is that right?:)

JJ-res101 avatar Nov 21 '21 01:11 JJ-res101

Just as you cited, "Flickr30K Entities [38] augments the original Flickr30K [58] with short region phrase correspondence annotations." which means the original sentences of Flickr30K are splited to short phrases and each phrase is annotated with a bbox. When training on Flickr30K Entities, each sample is consists of a phrase and a bbox.

jianghaojun avatar Mar 25 '22 02:03 jianghaojun