Exception: Processing failed

Open FeHuynhVI opened this issue 4 months ago • 1 comments

ERROR:root:Failed to parse JSON even after cleanup Traceback (most recent call last): File "G:\agent_service\outside_tools\PageIndex\run_pageindex.py", line 67, in toc_with_page_number = page_index_main(args.pdf_path, opt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\agent_service\outside_tools\PageIndex\pageindex\page_index.py", line 1102, in page_index_main return asyncio.run(page_index_builder()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\asyncio\runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "G:\agent_service\outside_tools\PageIndex\pageindex\page_index.py", line 1077, in page_index_builder structure = await tree_parser(page_list, opt, doc=doc, logger=logger) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "G:\agent_service\outside_tools\PageIndex\pageindex\page_index.py", line 1037, in tree_parser toc_with_page_number = await meta_processor( ^^^^^^^^^^^^^^^^^^^^^ File "G:\agent_service\outside_tools\PageIndex\pageindex\page_index.py", line 991, in meta_processor raise Exception('Processing failed') Exception: Processing failed

Sep 09 '25 11:09 FeHuynhVI

Hi, thanks for reporting this and sorry for the delayed response. We’re currently a bit short on manpower and catching up on issues.

From the traceback you shared, the error seems to come from dirty JSON being produced by the LLM during the parsing step, which then fails even after our cleanup attempts. This typically happens when the model outputs extra text, formatting artifacts that break JSON parsing.

To help us reproduce and fix this, could you please share (if it’s not private or sensitive):

The document (or a minimal excerpt of it) that triggered the error, if
Any specific options/flags you used when running PageIndex

With that, we can better trace where the JSON output goes wrong and strengthen the cleanup logic.

Thanks again for your patience and for bringing this up!

Oct 01 '25 05:10 rejojer