docstrange
docstrange copied to clipboard
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
Trying to run DocStrange in local GPU for Specific JSON format. Used the below code. Unfortunately got the unexpected keyword json_schema. Not sure if I am missing anything. ```python from...
Neither API, Online Version or Downloaded model are working to get Json Data, it fails by reporting "No content available"
version:1.1.6 env: mac m1 cpu Description: According to the documentation, calling: json_data = result.extract_data() print(json_data) should return all important data as flat JSON. However, in practice, the returned json_data is...
Does it support passing a Pydantic model for structured output? Or any plans to add it?
Happy to be educated here. Just noticed that the processing with gpu-mode is quite slow. This could be fully expected. Any thoughts on the processing times? I can provide some...
hi, do you have plans to opensource Nanonets-OCR2-Plus?
First of all, thank you for your amazing work on Nanonet — it’s truly impressive to see how well it performs, both in efficiency and output quality. I have a...
When using docstrange in GPU local processing mode (via the DocumentExtractor), it appears to default to a specific large model nanonets/Nanonets-OCR-s. For users with limited compute resources (e.g., less VRAM),...
This pull request adds support for specifying a custom root path for the DocStrange web interface, improving flexibility for deployments under subpaths (such as behind a reverse proxy). The changes...
I tested the same document images using both the hosted OCR model on https://docstrange.nanonets.com/ and the locally downloaded version of the same model. Surprisingly, the online model gave excellent OCR...