[Bug]: order of similarity matters, WHY?
Current Behavior
Using the default onnx model, Score function
def get_score(a, b):
return evaluation.evaluation(
{
'question': a
},
{
'question': b
}
)
Case 1:
a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(a, b))
print (get_score(a, c))
print (get_score(b, c))
0.7585506439208984
0.02885962650179863
0.0909486636519432
Case 2:
a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(b, a))
print (get_score(c, a))
print (get_score(c, b))
0.17746654152870178
0.013074617832899094
0.8378676772117615
Just changed x,y to y,x while passing argument to get_score, why drastic changes in scores?
Expected Behavior
No response
Steps To Reproduce
No response
Environment
No response
Anything else?
No response
It seems that you did not experiment with changing x,y to y,x. It seems that you should use
print (get_score(a, b)), print (get_score(b, a)) for comparison.
I did, look at these lines,
print (get_score(a, b)) # in case 1
0.7585506439208984
print (get_score(b, a)) # in case 2
0.17746654152870178
It's amazing that there is such a phenomenon!
Is it because the LLM replies differently if ranking/ordering of content is different in a rag application?
I don't know much about this part. Theoretically, the distance between the two vectors should be calculated to get the score. Swapping the positions should not affect the score.
We trained a cross-encoder model to evaluate similarity, where conceptually the pairs should ignore their positions. However, since it uses BERT, there could be some unusual behavior, as it's a lightweight transformer without any constraints to enforce this.