google-cloud-python icon indicating copy to clipboard operation
google-cloud-python copied to clipboard

document_text_detection API simply returns "Bad image data" for local image file

Open christian-steinmeyer opened this issue 2 years ago • 4 comments

Environment details

  • OS type and version: mac os
  • Python version: 3.10.6
  • pip version: 23.0.1
  • google-cloud-vision version: 3.4.0

Steps to reproduce

  1. Execute the below script

Code example

import cv2 as cv
from google.cloud import vision


# define path="/path/to/attached/file.jpg"
client = vision.ImageAnnotatorClient()
_, buffer = cv.imencode('.jpg', cv.imread(str(path)))
image = vision.Image(content=buffer.tobytes())
response = client.document_text_detection(image=image)

# response will be code 3: Bad image data

When trying to OCR the attached image, I'm getting "Bad image data" as a result. I thought it was due to the file size, but I've already successfully received responses for bigger files. I've also researched the issue quite a bit but didn't find anything useful across other issues / forums. Since there is no stacktrace or anything else, I'm really quite stumped and don't know how best to proceed. Any ideas? TRSTimes_Volume_5_No _1_1992-01_TRSTimes_Publications_US_0001

christian-steinmeyer avatar Mar 13 '23 17:03 christian-steinmeyer

I now found out, that the image was too high in resolution. By reducing the file size to below 4MB, I could upload it to the vision "try it now in your browser" website, Where I then got a more useful error message saying the max image resolution is 75 megapixels. In the documentation it actually says:

Image size should not exceed 75M pixels (length x width) for OCR analysis. If the image size exceeds 75M pixels (length x width) , the Vision API resizes the image; otherwise, the Vision API uses the original image.

I totally misunderstood this sentence. I guess what it means is that OCR will not work with images > 75M pixels, but other features (like object detection) might work on a resized image. Perhaps you could rephrase that documentation?

christian-steinmeyer avatar Mar 14 '23 08:03 christian-steinmeyer

I also found it confusing to only receive the feedback of "bad image data" via the python client response, but to get more details via the website itself.

Additionally, here's an example formulation for the documentation bit:

"Image size must not exceed 75M pixels (length x width) for OCR analysis. Larger images will result in an error. For other features of the Vision API, images that exceed this limit will be resized internally first."

christian-steinmeyer avatar Mar 15 '23 11:03 christian-steinmeyer

I'm going to transfer this issue to google-cloud-python as we're planning to move the code for google-cloud-vision there in the next 1-2 weeks

parthea avatar Oct 21 '23 20:10 parthea

This seems like fundamentally a Vision API/doc issue, not a client library-specific issue, though on the client library side, we should ensure that we are showing the user all the error details provided by the service. We'll look into this and keep you updated.

vchudnov-g avatar Dec 08 '23 21:12 vchudnov-g