transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[WIP] Add ZeroShotObjectDetectionPipeline (#18445)

Open sahamrit opened this issue 3 years ago • 1 comments

What does this PR do?

This PR adds the ZeroShotObjectDetectionPipeline. It is tested on OwlViTForObjectDetection model and should enable the inference following inference API

from transformers import pipeline

pipe = pipeline("zero-shot-object-detection")
pipe("cats.png", ["cat", "remote"])

This pipeline could default to the https://huggingface.co/google/owlvit-base-patch32 checkpoint

Fixes # (18445)

Before submitting

Who can review?

@alaradirik @Narsil

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

sahamrit avatar Sep 07 '22 22:09 sahamrit

The documentation is not available anymore as the PR was closed or merged.

Hi, just seeing the merge messed up the commit history. There are 377 changes, which is impossible for the review and merge the PR into main.

I suggest to reset to the last clean commit locally. Then use git rebase main to keep update with main (after pulling the latest changes from remote main into local main). Or any way works (as I am not sure what causes the current git status)

ydshieh avatar Sep 23 '22 14:09 ydshieh

Hi, just seeing the merge messed up the commit history. There are 377 changes, which is impossible for the review and merge the PR into main.

I suggest to reset to the last clean commit locally. Then use git rebase main to keep update with main (after pulling the latest changes from remote main into local main). Or any way works (as I am not sure what causes the current git status)

Hi @ydshieh sorry for that. Was in a hurry to wrap the PR since I was going for vacation. Messed up in rebasing. Have reverted to stable commit. Will add the correct changes once I am back!

sahamrit avatar Sep 23 '22 17:09 sahamrit

No problem, @sahamrit! I am super happy that you are able to get back to the stable commit 💯 . Have a nice vacation!

ydshieh avatar Sep 23 '22 17:09 ydshieh

Hi @alaradirik , can you review the changes?

sahamrit avatar Oct 05 '22 04:10 sahamrit

Thank you for this PR.

  • I suggest to modify the output of the pipeline to be more "natural". (see relevant comment).
  • text_queries should be renamed candidate_labels to be in line with zero-shot-classification.

Hey @Narsil! I suggested using text_queries instead because it is a multi-modal model where users query images with free-form text. The queried object is either found or not and the found object's label is not chosen from a selection of candidate labels, so I think it'd make more sense to keep as it is.

alaradirik avatar Oct 06 '22 10:10 alaradirik

Hey @Narsil! I suggested using text_queries instead because it is a multi-modal model where users query images with free-form text. The queried object is either found or not and the found object's label is not chosen from a selection of candidate labels, so I think it'd make more sense to keep as it is.

Are you sure ? I just tried your code, and it seems all the labels stem from the text being sent. Meaning I think there is a 1-1 correspondance between label and text_queries (meaning candidate_labels would be a fine name).

from transformers import pipeline

object_detector = pipeline(
    "zero-shot-object-detection", model="hf-internal-testing/tiny-random-owlvit-object-detection"
)

outputs = object_detector(
    "./tests/fixtures/tests_samples/COCO/000000039769.png",
    text_queries=["aaa cat", "xx"],
    threshold=0.64,
)
print(outputs)

Narsil avatar Oct 06 '22 10:10 Narsil

Hi @Narsil, Sure the output labels are taken exactly from the input text_queries. The reason of naming it "text_queries" instead of "candidate_labels" as in case of zero-shot-image-classification is that, in zero-shot-image-classification pipeline, the [candidate labels are wrapped by the hypothesis template](https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/zero_shot_image_classification.py#:~:text=candidate_labels%20(%60List%5Bstr,logits_per_image ), whereas here the text_queries are free text queries!

Hope it clarifies

sahamrit avatar Oct 06 '22 10:10 sahamrit

Are you sure ? I just tried your code, and it seems all the labels stem from the text being sent. Meaning I think there is a 1-1 correspondance between label and text_queries (meaning candidate_labels would be a fine name).

Yes, there is a 1-1 correspondence but I meant only the query text / a single label is evaluated for each object, whereas the label is selected from among multiple candidate labels for zero-shot-classification.

alaradirik avatar Oct 06 '22 10:10 alaradirik

Yes, there is a 1-1 correspondence but I meant only the query text / a single label is evaluated for each object, whereas the label is selected from among multiple candidate labels for zero-shot-classification.

I still think that zero-shot -> candidate_labels logic works. If we reuse names, it means that it's easier on users to discover and use pipelines. The fact that they are slightly different doesn't justify in my eyes the use of a different name. I would even argue that they are exactly the same and the difference in how they are used are cause by classification vs object-detection not by what candidate_labels are.

I personally think using candidate_labels would be misleading and confusing given architecture and use case of this model. There have been other zero-shot object detection papers published very recently and it'd be better to get the naming right in order to avoid future breaking changes.

Narsil avatar Oct 06 '22 11:10 Narsil

HI @Narsil @alaradirik, kindly review the changes

sahamrit avatar Oct 07 '22 08:10 sahamrit