transformers SAM example code does not work

System Info

transformers version: 4.29.0.dev0
Platform: Linux-3.10.0-957.12.2.el7.x86_64-x86_64-with-glibc2.10
Python version: 3.8.3
Huggingface_hub version: 0.13.4
Safetensors version: not installed
PyTorch version (GPU?): 1.5.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png" raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB") input_points = [[[450, 600]]] # 2D location of a window in the image

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device) outputs = model(**inputs)

masks = processor.image_processor.post_process_masks( outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu() ) scores = outputs.iou_scores

Expected behavior

RuntimeError Traceback (most recent call last) in 4 5 inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device) ----> 6 outputs = model(**inputs) 7 8 masks = processor.image_processor.post_process_masks(

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 548 result = self._slow_forward(*input, **kwargs) 549 else: --> 550 result = self.forward(*input, **kwargs) 551 for hook in self._forward_hooks.values(): 552 hook_result = hook(self, input, result)

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/transformers/models/sam/modeling_sam.py in forward(self, pixel_values, input_points, input_labels, input_boxes, input_masks, image_embeddings, multimask_output, output_attentions, output_hidden_states, return_dict, **kwargs) 1331 ) 1332 -> 1333 sparse_embeddings, dense_embeddings = self.prompt_encoder( 1334 input_points=input_points, 1335 input_labels=input_labels,

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs) 548 result = self._slow_forward(*input, **kwargs) 549 else: --> 550 result = self.forward(*input, **kwargs) 551 for hook in self._forward_hooks.values(): 552 hook_result = hook(self, input, result)

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/transformers/models/sam/modeling_sam.py in forward(self, input_points, input_labels, input_boxes, input_masks) 669 if input_labels is None: 670 raise ValueError("If points are provided, labels must also be provided.") --> 671 point_embeddings = self._embed_points(input_points, input_labels, pad=(input_boxes is None)) 672 sparse_embeddings = torch.empty((batch_size, point_batch_size, 0, self.hidden_size), device=target_device) 673 sparse_embeddings = torch.cat([sparse_embeddings, point_embeddings], dim=2)

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/transformers/models/sam/modeling_sam.py in _embed_points(self, points, labels, pad) 619 padding_point = torch.zeros(target_point_shape, device=points.device) 620 padding_label = -torch.ones(target_labels_shape, device=labels.device) --> 621 points = torch.cat([points, padding_point], dim=2) 622 labels = torch.cat([labels, padding_label], dim=2) 623 input_shape = (self.input_image_size, self.input_image_size)

RuntimeError: Expected object of scalar type double but got scalar type float for sequence element 1.

Apr 21 '23 19:04 YubinXie

Hello @YubinXie Thanks for the issue! I did not managed to reproduce your issue with torch==1.13.1, and here is the snippet I used:

from PIL import Image
import requests
import torch

from transformers import AutoModel, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModel.from_pretrained("facebook/sam-vit-base").to(device)
processor = AutoProcessor.from_pretrained("facebook/sam-vit-base")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]]  # 2D location of a window in the image

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model(**inputs)

I can see that you are using torch==1.5.x. Note that transformers has a minimum required version of 1.9 for torch: https://github.com/huggingface/transformers/blob/main/setup.py#L180 - hence I have tried to run that script with torch==1.9.1 and did not encountered the issue. I strongly recommend you to install a greater version of torch (i.e. use at least the version 1.9). Could you try to update torch and let us know if you still face the issue?

Apr 22 '23 08:04 younesbelkada

Hello @YubinXie Thanks for the issue! I did not managed to reproduce your issue with torch==1.13.1, and here is the snippet I used:
from PIL import Image
import requests
import torch

from transformers import AutoModel, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModel.from_pretrained("facebook/sam-vit-base").to(device)
processor = AutoProcessor.from_pretrained("facebook/sam-vit-base")

img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
input_points = [[[450, 600]]]  # 2D location of a window in the image

inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model(**inputs)
I can see that you are using torch==1.5.x. Note that transformers has a minimum required version of 1.9 for torch: https://github.com/huggingface/transformers/blob/main/setup.py#L180 - hence I have tried to run that script with torch==1.9.1 and did not encountered the issue. I strongly recommend you to install a greater version of torch (i.e. use at least the version 1.9). Could you try to update torch and let us know if you still face the issue?

Hi @younesbelkada Thank you for your response. I updated my torch and now the model works! However, I got another error the the post process:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-abdc2d7068b8> in <module>
      6 outputs = model(**inputs)
      7 
----> 8 masks = processor.image_processor.post_process_masks(
      9     outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu()
     10 )

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/transformers/models/sam/image_processing_sam.py in post_process_masks(self, masks, original_sizes, reshaped_input_sizes, mask_threshold, binarize, pad_size)
    404             interpolated_mask = F.interpolate(masks[i], target_image_size, mode="bilinear", align_corners=False)
    405             interpolated_mask = interpolated_mask[..., : reshaped_input_sizes[i][0], : reshaped_input_sizes[i][1]]
--> 406             interpolated_mask = F.interpolate(interpolated_mask, original_size, mode="bilinear", align_corners=False)
    407             if binarize:
    408                 interpolated_mask = interpolated_mask > mask_threshold

~/miniconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py in interpolate(input, size, scale_factor, mode, align_corners, recompute_scale_factor, antialias)
   3957         if antialias:
   3958             return torch._C._nn._upsample_bilinear2d_aa(input, output_size, align_corners, scale_factors)
-> 3959         return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
   3960     if input.dim() == 5 and mode == "trilinear":
   3961         assert align_corners is not None

TypeError: upsample_bilinear2d() received an invalid combination of arguments - got (Tensor, list, bool, NoneType), but expected one of:
 * (Tensor input, tuple of ints output_size, bool align_corners, tuple of floats scale_factors)
      didn't match because some of the arguments have invalid types: (Tensor, list of [Tensor, Tensor], bool, NoneType)
 * (Tensor input, tuple of ints output_size, bool align_corners, float scales_h, float scales_w, *, Tensor out)

The code is from hugging face SAM page. I wonder if it is code issue or, other package issue.

Apr 22 '23 19:04 YubinXie

Hi @YubinXie Thanks for iterating, it seems that this is a duplicate of https://github.com/huggingface/transformers/issues/22904 Could you try to uninstall transformers and re-install it from source?

Apr 22 '23 21:04 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 22 '23 15:05 github-actions[bot]