GPTCache icon indicating copy to clipboard operation
GPTCache copied to clipboard

[Bug]: order of similarity matters, WHY?

Open dhandhalyabhavik opened this issue 1 year ago • 6 comments

Current Behavior

Using the default onnx model, Score function

def get_score(a, b):
    return evaluation.evaluation(
        {
            'question': a
        },
        {
            'question': b
        }
    )

Case 1:

a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(a, b))
print (get_score(a, c))
print (get_score(b, c))

0.7585506439208984
0.02885962650179863
0.0909486636519432

Case 2:

a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(b, a))
print (get_score(c, a))
print (get_score(c, b))

0.17746654152870178
0.013074617832899094
0.8378676772117615

Just changed x,y to y,x while passing argument to get_score, why drastic changes in scores?

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

dhandhalyabhavik avatar Sep 09 '24 08:09 dhandhalyabhavik

It seems that you did not experiment with changing x,y to y,x. It seems that you should use print (get_score(a, b)), print (get_score(b, a)) for comparison.

SimFG avatar Sep 10 '24 03:09 SimFG

I did, look at these lines,

print (get_score(a, b)) # in case 1
0.7585506439208984
print (get_score(b, a)) # in case 2
0.17746654152870178

dhandhalyabhavik avatar Sep 11 '24 05:09 dhandhalyabhavik

It's amazing that there is such a phenomenon!

SimFG avatar Sep 11 '24 09:09 SimFG

Is it because the LLM replies differently if ranking/ordering of content is different in a rag application?

Ali-Parandeh avatar Sep 12 '24 18:09 Ali-Parandeh

I don't know much about this part. Theoretically, the distance between the two vectors should be calculated to get the score. Swapping the positions should not affect the score.

SimFG avatar Sep 13 '24 02:09 SimFG

We trained a cross-encoder model to evaluate similarity, where conceptually the pairs should ignore their positions. However, since it uses BERT, there could be some unusual behavior, as it's a lightweight transformer without any constraints to enforce this.

wxywb avatar Sep 13 '24 03:09 wxywb