Write a custom scorer for the eval.

Open navidkpr opened this issue 1 year ago • 1 comments

Thought: The eval should be more detailed for extracts. There is a big different between completely missing 10 out of 20 commits on an eval, vs. one commit starting with a non-capitalized letter. But we score both as 0. One should be scored around 0.5 and another around 0.9

(Slack thread)

And

Implement fuzzy search scoring

Also, if you get extra time, make more information available on Braintrust dashboard

Oct 03 '24 06:10 navidkpr

fuzzy search scoring is an option, how about embeddings vector similarity?

Oct 07 '24 00:10 filip-michalsky