Proposal: Enhancing the Cadet Chatbot with Retrieval-Augmented Generation (RAG)
This issue proposes upgrading the Cadet Chatbot with a Retrieval-Augmented Generation (RAG) pipeline backed by pgvector in Postgres. The goal is to enable dynamic retrieval of relevant passages from the full SICP textbook, improve memory beyond single conversations, and enhance pedagogical quality. A detailed proposal is attached in the PDF.
This is indeed a great idea for Louis. The current prompt design is already based on chapter- or paragraph-level queries, and RAG is very effective at handling this kind of targeted retrieval. If applied to longer texts, such as an entire chapter or even a full book, RAG can analyze and surface the most relevant sections efficiently, which would be of great help to students.
@Isha-Sovasaria Thank you for working on Louis and the proposal.
Your understanding of current status, and proposed retrieval pipeline is excellent.
The identified of the limitation of "Knowledge Coverage" being limited to "pre-written summaries and user-provided paragraphs" is insightful. The new approach can theoretically allow knowledge of the entire textbook being retrieved to augment the query.
Your proposal of "Optional Persistent Memory" is very ambitious. When done right, it could serves as an advanced, high level (query -> answer) cache, deliver significantly better user experience by reducing latency (while saving token costs on our end).
However, due to the internal complexity of QA cache, it is recommended to attempt RAG in phase 1 and KV cache in phase 2.
My primary concern is that, currently we lack a systematic, automated way to evaluate our bot.
Benchmark our chat bot is not easy, as the chatbot may fail to deliver full value to user for reasons other than knowledge gap of SICP. For example, model may lack context of user's ability and fails to explain in words the user understand. The model may lack information about the scope of our course and become digressive. The model may also lack of context of the problem the user is making reference to.
Without benchmark, it may also be difficult for you to experiment on and evaluate different RAG methods and designs.
The below excel sheets contains some (context, query) pairs that may be helpful to your work https://docs.google.com/spreadsheets/d/1Ge7OzYMwdrHYbOkXfJ0LuSfnPYHujABJAMCa4qFzYsA/edit?usp=sharing
Thank you again for working on improving Louis, and all the best for your work.
RAG definitely sounds like a good idea. By using RAG to retrieve the few most relevant pages of the textbook can definitely increase the accuracy of the Chatbot. However one thing to consider is the cost, I implemented RAG in a past project before and it was quite costly, especially using ChatGPT API, but things may have changed and I may be wrong. Benefits of RAG is there are many ways to optimize it and would make the Chatbot more effective and relevant to Source Academy.
RAG definitely sounds like a good idea. By using RAG to retrieve the few most relevant pages of the textbook can definitely increase the accuracy of the Chatbot. However one thing to consider is the cost, I implemented RAG in a past project before and it was quite costly, especially using ChatGPT API, but things may have changed and I may be wrong. Benefits of RAG is there are many ways to optimize it and would make the Chatbot more effective and relevant to Source Academy.
I think we can safely assume that money isn't going to be a problem as long we have a sound plan.