pdf-to-json topic

List pdf-to-json repositories

unstructured

8.6k
Stars
702
Forks
Watchers

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

statement-parser

30
Stars
5
Forks
Watchers

Parse bank and credit card statements

ocr-python

74
Stars
11
Forks
Watchers

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

PDF-Verse

186
Stars
52
Forks
Watchers

PDF Verse is a powerful web based PDF Editor with tools for editing, converting, and manipulating PDFs. Merge, compress, add or remove pages, or extract text using OCR technology. Convert PDF to DOC,...

docling

22.8k
Stars
1.3k
Forks
Watchers

Get your documents ready for gen AI

docstrange

1.1k
Stars
105
Forks
1.1k
Watchers

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

llama_cloud_services

4.2k
Stars
467
Forks
4.2k
Watchers

Knowledge Agents and Management in the Cloud

opendataloader-pdf

814
Stars
43
Forks
814
Watchers

PDF Parsing for RAG — Convert to Markdown & JSON, Fast, Local, No GPU

graphlit

26
Stars
2
Forks
26
Watchers

Graphlit Platform

graphlit-client-python

18
Stars
2
Forks
18
Watchers

Python client library for Graphlit Platform