LogicCheckGPT icon indicating copy to clipboard operation
LogicCheckGPT copied to clipboard

About baseline

Open haohaodw opened this issue 1 year ago • 2 comments

A nice work. I would like to ask a question about LURE. LURE needs to mask the object during inference and then correct it. However, POPE and MME are discriminant tasks, using YES/NO to answer questions. How do you test the performance of LURE on these two data sets?

haohaodw avatar Aug 20 '24 09:08 haohaodw

Thanks for your interest! In our experiments, we have observed that the responses from the four LVLMs to POPE questions are in the format as "Yes/No, there is/isn't {object} ..." This format allows LURE to mask the object. For instance, the responses of mPLUG-Owl to some POPE questions are listed below:

issue_1

The responses of LLaVA-1.5 to some POPE questions are listed below:

issue_2

Hyperwjf avatar Aug 22 '24 08:08 Hyperwjf

However, when calculating the accuracy of POPE, the calculation is yes or no. So how do you judge whether the modified response of LURE is correct?

haohaodw avatar Aug 22 '24 08:08 haohaodw