Need support for Sentence Similarity Pipeline
Feature request
HuggingFace now has a lot of Sentence Similarity models, but the pipeline does not yet support this: https://huggingface.co/docs/transformers/main_classes/pipelines
Motivation
HuggingFace now has a lot of Sentence Similarity models, but the pipeline does not yet support this: https://huggingface.co/docs/transformers/main_classes/pipelines
Your contribution
I can write a PR, but might need some one else's help.
cc @Narsil
Hi @timxieICN ,
Thanks for the suggestion.
In general, sentence-similarity like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 are served by SentenceTransformers which is a library on top of transformers itself.
https://huggingface.co/sentence-transformers
Sentence transformers adds a few configuration specifically on how to do similarity with a given model as there's several ways to do it.
From a user point of view it should be relatively easy to do this:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer(
model_id
)
embeddings1 = model.encode(
inputs["source_sentence"], convert_to_tensor=True
)
embeddings2 = model.encode(inputs["sentences"], convert_to_tensor=True)
similarities = util.pytorch_cos_sim(embeddings1, embeddings2)
This is exactly the code that is actually running to calculate those on the hub currently: https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/sentence_similarity.py
Adding this directly in transformers would basically mean incorporating sentence-transformers within transformers and I'm not sure it's something desired. Maybe @amyeroberts or another core maintainer can confirm/infirm this.
Does this help ?
We definitely don't want a circular dependency like that!
As the example you shared @Narsil is so simple, I think it's a good replacement for a pipeline. Let's leave this issue open and if there's a lot of interest or new use case we can consider other possible options.
Hi @timxieICN ,
Thanks for the suggestion. In general, sentence-similarity like https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 are served by
SentenceTransformerswhich is a library on top oftransformersitself.https://huggingface.co/sentence-transformers
Sentence transformers adds a few configuration specifically on how to do similarity with a given model as there's several ways to do it.
From a user point of view it should be relatively easy to do this:
from sentence_transformers import SentenceTransformer, util model = SentenceTransformer( model_id ) embeddings1 = model.encode( inputs["source_sentence"], convert_to_tensor=True ) embeddings2 = model.encode(inputs["sentences"], convert_to_tensor=True) similarities = util.pytorch_cos_sim(embeddings1, embeddings2)This is exactly the code that is actually running to calculate those on the hub currently: https://github.com/huggingface/api-inference-community/blob/main/docker_images/sentence_transformers/app/pipelines/sentence_similarity.py
Adding this directly in
transformerswould basically mean incorporatingsentence-transformerswithintransformersand I'm not sure it's something desired. Maybe @amyeroberts or another core maintainer can confirm/infirm this.Does this help ?
Hi @Narsil, this is api of sentence transformer, I want to use sentence similarity of T5 model. So how to do that?
Thank you
I think that measuring distance between elements provided, by any embedding generation model, would be desirable indeed, I'm open to try and help if you want to do that.