[Bug report] Smallest Text Detection elements for Chinese are lines, not characters
Describe the bug According to the documentation at https://developers.google.com/ml-kit/vision/text-recognition/v2:
an Element is a contiguous set of alphanumeric characters ("word") on the same axis in most Latin languages, or a character in others
In my testing, a TextRecognizer created using ChineseTextRecognizerOptions yields Elements that are whole lines, and not characters.
To Reproduce GoogleOCRDemo.zip
The attached sample app performs recognition on Chinese text and lists each element found, prefixed by a number.
- Open the app and tap "Recognize".
- After a moment, elements recognized are listed in a scrolling text view. Observe that each element contains multiple characters, not one as the documentation indicates.
Expected behavior I expect each element to represent a single Chinese character. This is very useful for applications where its desirable to enable text selection atop recognised text. It also matches the behaviour of the Tesseract API, and of Apple's OCR frameworks.
SDK Info:
pod 'GoogleMLKit/TextRecognitionChinese', '2.6.0'
Smartphone: iPhone 12
Development Environment:
- Xcode 13.4.1
- macOS 12.4
Hello—is this the right place to raise issues like this? If not, I'm happy to dupe elsewhere.
This is a known issue and fixable, but we haven't planned a release for it yet.
I'm afraid with the shift of priority in our team, there is no plan in the near future to address this.