pdf-processing topic
PDFs-TextExtract
Multiple and Large PDF Documents Text Extraction.
document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
doc-chatbot
Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.
papermage
library supporting NLP and CV research on scientific papers
pdf-to-text-chroma-search
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma...
masquerade
The Privacy Firewall for LLMs