Semantic cache issue on complex rag chain

Open pmosconi opened this issue 1 year ago • 1 comments

Hi, I am following notebooks/rag/mongodb-langchain-cache-memory.ipynb and I have an issue when activating semantic cache for the complete RAG chain: the system always returns the answer to the first question that has been cached. Why is this happening? The notebook, in fact, applies the cache only to the simplified RAG chain, but it doesn't explain why the full chain won't work with it. Thanks

May 05 '24 14:05 pmosconi

The simplified RAG chain only passes the user query to the LLM, but the complete RAG chain constructs a prompt from retrieved documents + question. If you apply the cache to the outermost chain, the first time you run it and retrieved documents are combined with the query → response is cached. Next time you run a different query, it might still hit the same cache key if the combined inputs are not distinct enough semantically (especially if the question is similar). This leads to stale or incorrect cache hits. I'd like to work on this issue and propose a fix (I'll try my best)

Jun 21 '25 08:06 khushipy