BARTScore icon indicating copy to clipboard operation
BARTScore copied to clipboard

logical behind score() in bart_score.py

Open zwxu064 opened this issue 1 year ago • 2 comments

Hi, I am wondering how is the logical reasoning of this model. If my source and target are source: ["I have one sister and one brother"] target: [["I have no siblings", "I have 2 siblings", "I have three siblings", "my parents have 3 kids"]]

the BARTScores: [[-3.561114549636841], [-3.4354922771453857], [-3.130859136581421], [-4.85980749130249]]

corresponding to the quality "I have three siblings" > "I have 2 siblings" > "I have no siblings" > "my parents have 3 kids". Apparently, this is wrong.

Thanks.

zwxu064 avatar Dec 09 '24 03:12 zwxu064

I did reproduce this with the model trained on parabank2, altho the numbers are slightly different to yours. When changing "I have 2 siblings" to "I have two siblings", the model is able to score this one much higher than others. This suggests that the model hasn't seen many numerical values during its training process, making it unfamiliar with them.

Regarding the example "my parents have 3 kids", this requires a higher level of reasoning and cannot be considered a direct paraphrase of "I have one sister and one brother", even though they both refer to the same fact. Therefore the model may struggle on this.

yyy-Apple avatar Dec 13 '24 02:12 yyy-Apple

I agree, that means BARTScore is not intrinsically good at reasoning, even if the source context is given, the score for the target sentence sometimes has a discrepancy to the real semantic meaning, such as the semantic of 2 and two are the same, but only two is recognised because no Arabic numbers are used in the source context.

zwxu064 avatar Dec 14 '24 06:12 zwxu064