.Net: Summarization and translation evaluation examples with Filters

Open dmytrostruk opened this issue 1 year ago • 0 comments

Motivation and Context

This example demonstrates how to perform quality check on LLM results for such tasks as text summarization and translation with Semantic Kernel Filters.

Metrics used in this example:

BERTScore - leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity.
BLEU (BiLingual Evaluation Understudy) - evaluates the quality of text which has been machine-translated from one natural language to another.
METEOR (Metric for Evaluation of Translation with Explicit ORdering) - evaluates the similarity between the generated summary and the reference summary, taking into account grammar and semantics.
COMET (Crosslingual Optimized Metric for Evaluation of Translation) - is an open-source framework used to train Machine Translation metrics that achieve high levels of correlation with different types of human judgments.

In this example, SK Filters call dedicated server which is responsible for task evaluation using metrics described above. If evaluation score of specific metric doesn't meet configured threshold, an exception is thrown with evaluation details.

Hugging Face Evaluate Metric library is used to evaluate summarization and translation results.

Contribution Checklist

[x] The code builds clean without any errors or warnings
[x] The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
[x] All unit tests pass, and I have added new tests where possible
[x] I didn't break anyone :smile:

May 15 '24 05:05 dmytrostruk