Document search method scalability
If we have thousands or millions of documents, using LLM's to pick the documents will take a long time and will be quite expensive as well. Is there any other methodology planned to be integrated? For Ex. Vector Search? This could probably be implemented outside of pageindex but i am curious about the recall numbers of vector search on document summaries alone.
Hi, thanks for your interest. I’m wondering what kinds of documents you have. Can they be distinguished by specific labels? For example, for financial reports from different companies, we use the company name and date to select the target documents. In such cases, I recommend using query-to-SQL to retrieve the relevant documents.
If your documents can’t be distinguished by predefined labels, e.g. they are different types, then a vector-based search can be used to select docs and nodes. We’ll be publishing an article this week on how to integrate vector search into our reasoning-based RAG system.
Keen to learn more about your use cases!
Hi, we released a new article about how to integrate vector-based search, see: https://pageindex.vectify.ai/examples/hybrid-rag. Please let me know if anything is unclear or missing in the article.
We are also doing some large-scale results and preparing our formal technical paper about this. We will keep you updated.
Thanks!
Hi, https://pageindex.vectify.ai/examples/hybrid-rag, the website seems can't be opened with 404 error Could you tell us where can we find article about how to integrate vector-based search
Hi Zhanli, thank you for your interest. Sorry for the reconstruction (still in progress) of the website, we have just moved it to https://docs.pageindex.ai/tree-search/hybrid.