transformers Need support for Sentence Similarity Pipeline

Feature request

HuggingFace now has a lot of Sentence Similarity models, but the pipeline does not yet support this: https://huggingface.co/docs/transformers/main_classes/pipelines

Motivation

HuggingFace now has a lot of Sentence Similarity models, but the pipeline does not yet support this: https://huggingface.co/docs/transformers/main_classes/pipelines

Your contribution

I can write a PR, but might need some one else's help.

Apr 21 '23 14:04 timxieICN

cc @Narsil

Apr 21 '23 15:04 amyeroberts

Hi @timxieICN ,

Thanks for the suggestion. In general, sentence-similarity like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 are served by SentenceTransformers which is a library on top of transformers itself.

https://huggingface.co/sentence-transformers

Sentence transformers adds a few configuration specifically on how to do similarity with a given model as there's several ways to do it.

From a user point of view it should be relatively easy to do this:

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer(
    model_id
)

embeddings1 = model.encode(
    inputs["source_sentence"], convert_to_tensor=True
)
embeddings2 = model.encode(inputs["sentences"], convert_to_tensor=True)
similarities = util.pytorch_cos_sim(embeddings1, embeddings2)

This is exactly the code that is actually running to calculate those on the hub currently: https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/sentence_similarity.py

Adding this directly in transformers would basically mean incorporating sentence-transformers within transformers and I'm not sure it's something desired. Maybe @amyeroberts or another core maintainer can confirm/infirm this.

Does this help ?

Apr 21 '23 15:04 Narsil

We definitely don't want a circular dependency like that!

As the example you shared @Narsil is so simple, I think it's a good replacement for a pipeline. Let's leave this issue open and if there's a lot of interest or new use case we can consider other possible options.

Apr 21 '23 17:04 amyeroberts

Hi @timxieICN ,

Thanks for the suggestion. In general, sentence-similarity like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 are served by SentenceTransformers which is a library on top of transformers itself.

https://huggingface.co/sentence-transformers

Sentence transformers adds a few configuration specifically on how to do similarity with a given model as there's several ways to do it.

From a user point of view it should be relatively easy to do this:
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer(
    model_id
)

embeddings1 = model.encode(
    inputs["source_sentence"], convert_to_tensor=True
)
embeddings2 = model.encode(inputs["sentences"], convert_to_tensor=True)
similarities = util.pytorch_cos_sim(embeddings1, embeddings2)
This is exactly the code that is actually running to calculate those on the hub currently: https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/sentence_similarity.py

Adding this directly in transformers would basically mean incorporating sentence-transformers within transformers and I'm not sure it's something desired. Maybe @amyeroberts or another core maintainer can confirm/infirm this.

Does this help ?

Hi @Narsil, this is api of sentence transformer, I want to use sentence similarity of T5 model. So how to do that?

Thank you

Nov 06 '23 15:11 viethoang303

I think that measuring distance between elements provided, by any embedding generation model, would be desirable indeed, I'm open to try and help if you want to do that.

Nov 21 '23 20:11 wilmeragsgh