open-parse icon indicating copy to clipboard operation
open-parse copied to clipboard

Improved file parsing for LLM’s

Results 33 open-parse issues
Sort by recently updated
recently updated
newest added

## Description This is a tentative roadmap, I will update it as things evolve. ## Roadmap **High Priority:** - [x] Implement unitable - [x] Enable OCR support - [ ]...

Given this code: ``` import openparse basic_doc_path = "mydoc.pdf" parser = openparse.DocumentParser( table_args={ "parsing_algorithm": "unitable", "min_table_confidence": 0.8, } ) parsed_basic_doc = parser.parse(basic_doc_path) for node in parsed_basic_doc.nodes: print(node.json()) ``` I'm getting...

bug
good first issue

### Initial Checks - [X] I confirm that I'm on the latest version ### Description Hi there, Thanks for your open parse 1st and it looks cool in most of...

bug

### Description Can you combine pymupdf's pdf4llm.to_markdown() to make the parsed pdf more hierarchical (for example, use ("##", "Header 1") to represent the first-level heading, ("###", "Header 2") represents the...

enhancement

### Description PDF is a document with mixed graphics and text. When we are doing RAG, the pictures in the PDF often contain important information, so we generally need to...

enhancement

### Initial Checks - [X] I confirm that I'm on the latest version ### Description File "C:\Users\amanv\Desktop\new-env\lib\site-packages\openparse\config.py", line 16, in __init__ if torch.cuda.is_available(): AttributeError: module 'torch' has no attribute 'cuda'...

bug

### Initial Checks - [X] I confirm that I'm on the latest version ### Description Why is my Basic Extraction not running properly? The prompt message is as follows: ![image](https://github.com/Filimoa/open-parse/assets/33474854/237d6479-6f3e-4f2e-9b52-8f58210fa88a)...

bug

## description: Fixed the bug that when parsing PDF, when the PDF content is converted from PPT to a file, the layout of the content is found to be reversed....

Your code example seems to imply that.

enhancement

### Initial Checks - [X] I confirm that I'm on the latest version ### Description I'm trying to use the https://filimoa.github.io/open-parse/processing/parsing-tables/unitable/ support to extract content out of a UB-04 document...

bug