If the file name has Spaces, it will simply fail
(.venv) skype@192 PageIndex % python3 run_pageindex.py --pdf_path /Users/skype/Documents/GitHub/PageIndex/docs/Websocket vs SSE- OpenAI.pdf usage: run_pageindex.py [-h] [--pdf_path PDF_PATH] [--model MODEL] [--toc-check-pages TOC_CHECK_PAGES] [--max-pages-per-node MAX_PAGES_PER_NODE] [--max-tokens-per-node MAX_TOKENS_PER_NODE] [--if-add-node-id IF_ADD_NODE_ID] [--if-add-node-summary IF_ADD_NODE_SUMMARY] [--if-add-doc-description IF_ADD_DOC_DESCRIPTION] run_pageindex.py: error: unrecognized arguments: vs SSE- OpenAI.pdf
(.venv) skype@192 PageIndex % python3 run_pageindex.py --pdf_path /Users/skype/Documents/GitHub/PageIndex/docs/Regulation Best Interest_proposed rule.pdf usage: run_pageindex.py [-h] [--pdf_path PDF_PATH] [--model MODEL] [--toc-check-pages TOC_CHECK_PAGES] [--max-pages-per-node MAX_PAGES_PER_NODE] [--max-tokens-per-node MAX_TOKENS_PER_NODE] [--if-add-node-id IF_ADD_NODE_ID] [--if-add-node-summary IF_ADD_NODE_SUMMARY] [--if-add-doc-description IF_ADD_DOC_DESCRIPTION] run_pageindex.py: error: unrecognized arguments: Best Interest_proposed rule.pdf
Hi sliders, thanks for raising this point.
For the file name that includes the space, either quote the whole path or escape each space:
For example
python3 run_pageindex.py --pdf_path "./example report.pdf"
# or
python3 run_pageindex.py --pdf_path ./example\ report.pdf
Hope this can work.
I had previously resolved the issue, but after pulling the latest code, I encountered an error when running the following command:
python3 run_pageindex.py --pdf_path '/Users/skype/Documents/GitHub/PageIndex/docs/2023-annual-report.pdf'
No corresponding JSON file was generated, and an error occurred during execution.
Let me know if you'd like to include the specific error message in the description, and I can help you translate or format that as well!
(.venv) skype@192 PageIndex % python3 run_pageindex.py --pdf_path '/Users/skype/Documents/GitHub/PageIndex/docs/2023-annual-report.pdf'
Parsing PDF...
start find_toc_pages
toc found
start detect_page_index
index found
process_toc_with_page_numbers
start_index: 1
start toc_transformer
start toc_index_extractor
Traceback (most recent call last):
File "/Users/skype/Documents/GitHub/PageIndex/run_pageindex.py", line 35, in
I had previously resolved the issue, but after pulling the latest code, I encountered an error when running the following command:
python3 run_pageindex.py --pdf_path '/Users/skype/Documents/GitHub/PageIndex/docs/2023-annual-report.pdf'No corresponding JSON file was generated, and an error occurred during execution.
Let me know if you'd like to include the specific error message in the description, and I can help you translate or format that as well!
Hi @sliderss, thanks so much for reporting this and for the detailed traceback!
The issue was introduced in one of the previous commits (on April 18) due to start_index being passed incorrectly. It’s now fixed — please pull the latest code and try again.
We really appreciate your feedback and support. Let us know if anything else comes up! 🙏