[Question] OCR
Context / Scenario
I referred to this example and wrote an implementation of OCR. Attempting to scan PDF and PDF containing images did not trigger it. I'm not sure if there was anything wrong with the operation
Question
I referred to this example and wrote an implementation of OCR. Attempting to scan PDF and PDF containing images did not trigger it. I'm not sure if there was anything wrong with the operation
Looks like this is currently not possible, see code: https://github.com/microsoft/kernel-memory/blob/main/service/Core/DataFormats/Pdf/PdfDecoder.cs
Altough we already have (https://github.com/microsoft/kernel-memory/blob/main/service/Abstractions/DataFormats/IOcrEngine.cs) in place, which would be enough for simple text extraction, and UglyToad.PdfPig is able to extract images as experimental feature.
@dluc Wouldn't it be possible to extend "FileContent" with a Array of found Images in the PDF described GPT-4 Vision Api if enabled?
I think that you can support this scenario when the issue https://github.com/microsoft/kernel-memory/issues/379 will be completed (currently there is a PR in preview).
With that, you will be able to inject a custom decoder for PDF files.
Given that now custom content decoders can be injected, I would first try creating one that replaces the default PDF decoder, and internally does all the work of extracting text and text from images. E.g. you can create a decoder that depends on the existing image decoder to parse images, and return all the text at the end, without the need to revisit the FileContent class (for now).