document_text_detection API simply returns "Bad image data" for local image file
Environment details
- OS type and version: mac os
- Python version:
3.10.6 - pip version:
23.0.1 -
google-cloud-visionversion:3.4.0
Steps to reproduce
- Execute the below script
Code example
import cv2 as cv
from google.cloud import vision
# define path="/path/to/attached/file.jpg"
client = vision.ImageAnnotatorClient()
_, buffer = cv.imencode('.jpg', cv.imread(str(path)))
image = vision.Image(content=buffer.tobytes())
response = client.document_text_detection(image=image)
# response will be code 3: Bad image data
When trying to OCR the attached image, I'm getting "Bad image data" as a result. I thought it was due to the file size, but I've already successfully received responses for bigger files. I've also researched the issue quite a bit but didn't find anything useful across other issues / forums. Since there is no stacktrace or anything else, I'm really quite stumped and don't know how best to proceed. Any ideas?

I now found out, that the image was too high in resolution. By reducing the file size to below 4MB, I could upload it to the vision "try it now in your browser" website, Where I then got a more useful error message saying the max image resolution is 75 megapixels. In the documentation it actually says:
Image size should not exceed 75M pixels (length x width) for OCR analysis. If the image size exceeds 75M pixels (length x width) , the Vision API resizes the image; otherwise, the Vision API uses the original image.
I totally misunderstood this sentence. I guess what it means is that OCR will not work with images > 75M pixels, but other features (like object detection) might work on a resized image. Perhaps you could rephrase that documentation?
I also found it confusing to only receive the feedback of "bad image data" via the python client response, but to get more details via the website itself.
Additionally, here's an example formulation for the documentation bit:
"Image size must not exceed 75M pixels (length x width) for OCR analysis. Larger images will result in an error. For other features of the Vision API, images that exceed this limit will be resized internally first."
I'm going to transfer this issue to google-cloud-python as we're planning to move the code for google-cloud-vision there in the next 1-2 weeks
This seems like fundamentally a Vision API/doc issue, not a client library-specific issue, though on the client library side, we should ensure that we are showing the user all the error details provided by the service. We'll look into this and keep you updated.