Adanvanced RAG implementations
For better evaluation result in HF QA dataset.
- Reranking: BCE, BGE-M3 scoring
- Query rewrite
- to generate SQL filter for given prompt
- to generate hyperthecial queries
more complex rag pipeline may invole agent frameworks #18
https://github.com/langchain-ai/langgraph/blob/70c1c996a4c9fe8df518bcd849b3c6453dd0d58b/examples/rag/langgraph_adaptive_rag.ipynb
Rerank, Colbert, ...
Related work
Methods
- Keyword based - BM25
- Cross-encoders BCE reranker
- Late-interaction - ColBERT, BGE-M3
- 2004.12832v2.pdf
- https://zhuanlan.zhihu.com/p/683483778
Opensourced Projects
- Official COLBERT implementation: https://github.com/bclavie/RAGatouille
- BCE family: https://github.com/netease-youdao/BCEmbedding
- BAAI BGE family: https://github.com/FlagOpen/FlagEmbedding
- BCE & BGE in C++: https://github.com/li-plus/chatglm.cpp
Conlcusion
- Both BCE and BGE family can be regarded as SOTA.
- For multilanguage use case, valillan RAGatouille lags behind.
- As Late-interaction models are faster in inference. bge-m3 is prefered as first ranking methods in RAG pipeline.
Timeline
- #21
- [x] experiemntes with BERT + BGE M3 model for better understanding of forward pass, ~3 days~ , 1 day.
- [x] bge-m3 inference code. A standalone repo may be needed. ~7 days~, 3 days.
- #20
- [x] file and filebatch controllers, services, mappers. 3 days.
- [x] duckdb vectordb operator, retriever factory, 2 days
- [x] DuckDB: search with filter, delete with filter, 1 day
- [x] summarization pipeline for files, 2 days
- [x] file object handler, 2 days
- [x] file batch background jobs to update its progress, 1 day
- [x] search tool 3 days.
- [ ] (optional) parallel react agent executor, 2 days
- [x] integration test 3 days.
- [ ] eval test HF QA datast. 3 days.
OpenAI officials parameter for RAG: https://platform.openai.com/docs/assistants/tools/file-search/how-it-works
By default, the file_search tool uses the following settings: Chunk size: 800 tokens Chunk overlap: 400 tokens Embedding model: text-embedding-3-large at 256 dimensions Maximum number of chunks added to context: 20 (could be fewer)
Supported file formats: https://platform.openai.com/docs/assistants/tools/file-search/supported-files