backend icon indicating copy to clipboard operation
backend copied to clipboard

Proposal: Enhancing the Cadet Chatbot with Retrieval-Augmented Generation (RAG)

Open Isha-Sovasaria opened this issue 7 months ago • 4 comments

This issue proposes upgrading the Cadet Chatbot with a Retrieval-Augmented Generation (RAG) pipeline backed by pgvector in Postgres. The goal is to enable dynamic retrieval of relevant passages from the full SICP textbook, improve memory beyond single conversations, and enhance pedagogical quality. A detailed proposal is attached in the PDF.

Proposal-Revamping Louis with RAG.pdf

Isha-Sovasaria avatar Sep 12 '25 17:09 Isha-Sovasaria

This is indeed a great idea for Louis. The current prompt design is already based on chapter- or paragraph-level queries, and RAG is very effective at handling this kind of targeted retrieval. If applied to longer texts, such as an entire chapter or even a full book, RAG can analyze and surface the most relevant sections efficiently, which would be of great help to students.

Song-Mengfei avatar Sep 20 '25 05:09 Song-Mengfei

@Isha-Sovasaria Thank you for working on Louis and the proposal.

Your understanding of current status, and proposed retrieval pipeline is excellent.

The identified of the limitation of "Knowledge Coverage" being limited to "pre-written summaries and user-provided paragraphs" is insightful. The new approach can theoretically allow knowledge of the entire textbook being retrieved to augment the query.

Your proposal of "Optional Persistent Memory" is very ambitious. When done right, it could serves as an advanced, high level (query -> answer) cache, deliver significantly better user experience by reducing latency (while saving token costs on our end).

However, due to the internal complexity of QA cache, it is recommended to attempt RAG in phase 1 and KV cache in phase 2.

My primary concern is that, currently we lack a systematic, automated way to evaluate our bot.

Benchmark our chat bot is not easy, as the chatbot may fail to deliver full value to user for reasons other than knowledge gap of SICP. For example, model may lack context of user's ability and fails to explain in words the user understand. The model may lack information about the scope of our course and become digressive. The model may also lack of context of the problem the user is making reference to.

Without benchmark, it may also be difficult for you to experiment on and evaluate different RAG methods and designs.

The below excel sheets contains some (context, query) pairs that may be helpful to your work https://docs.google.com/spreadsheets/d/1Ge7OzYMwdrHYbOkXfJ0LuSfnPYHujABJAMCa4qFzYsA/edit?usp=sharing

Thank you again for working on improving Louis, and all the best for your work.

yiwen101 avatar Sep 20 '25 07:09 yiwen101

RAG definitely sounds like a good idea. By using RAG to retrieve the few most relevant pages of the textbook can definitely increase the accuracy of the Chatbot. However one thing to consider is the cost, I implemented RAG in a past project before and it was quite costly, especially using ChatGPT API, but things may have changed and I may be wrong. Benefits of RAG is there are many ways to optimize it and would make the Chatbot more effective and relevant to Source Academy.

meowzz28 avatar Sep 20 '25 08:09 meowzz28

RAG definitely sounds like a good idea. By using RAG to retrieve the few most relevant pages of the textbook can definitely increase the accuracy of the Chatbot. However one thing to consider is the cost, I implemented RAG in a past project before and it was quite costly, especially using ChatGPT API, but things may have changed and I may be wrong. Benefits of RAG is there are many ways to optimize it and would make the Chatbot more effective and relevant to Source Academy.

I think we can safely assume that money isn't going to be a problem as long we have a sound plan.

martin-henz avatar Sep 20 '25 12:09 martin-henz