open-parse
open-parse copied to clipboard
Improved file parsing for LLM’s
🚀 Roadmap
## Description This is a tentative roadmap, I will update it as things evolve. ## Roadmap **High Priority:** - [x] Implement unitable - [x] Enable OCR support - [ ]...
Given this code: ``` import openparse basic_doc_path = "mydoc.pdf" parser = openparse.DocumentParser( table_args={ "parsing_algorithm": "unitable", "min_table_confidence": 0.8, } ) parsed_basic_doc = parser.parse(basic_doc_path) for node in parsed_basic_doc.nodes: print(node.json()) ``` I'm getting...
### Initial Checks - [X] I confirm that I'm on the latest version ### Description Hi there, Thanks for your open parse 1st and it looks cool in most of...
### Description Can you combine pymupdf's pdf4llm.to_markdown() to make the parsed pdf more hierarchical (for example, use ("##", "Header 1") to represent the first-level heading, ("###", "Header 2") represents the...
### Description PDF is a document with mixed graphics and text. When we are doing RAG, the pictures in the PDF often contain important information, so we generally need to...
### Initial Checks - [X] I confirm that I'm on the latest version ### Description File "C:\Users\amanv\Desktop\new-env\lib\site-packages\openparse\config.py", line 16, in __init__ if torch.cuda.is_available(): AttributeError: module 'torch' has no attribute 'cuda'...
### Initial Checks - [X] I confirm that I'm on the latest version ### Description Why is my Basic Extraction not running properly? The prompt message is as follows: ...
## description: Fixed the bug that when parsing PDF, when the PDF content is converted from PPT to a file, the layout of the content is found to be reversed....
### Initial Checks - [X] I confirm that I'm on the latest version ### Description I'm trying to use the https://filimoa.github.io/open-parse/processing/parsing-tables/unitable/ support to extract content out of a UB-04 document...