GPT-RAG
GPT-RAG copied to clipboard
Enable Response Caching to reduce costs
As part of the roadmap there is a request to include in the solution to reduce the numbers of requests to OpenAI to enable Caching.
We are currently working on it.
Hello, has this been addressed already?
references: https://www.linkedin.com/feed/update/urn:li:activity:7177084885977718785/
https://stochasticcoder.com/2024/03/22/improve-llm-performance-using-semantic-cache-with-cosmos-db/
Additional research links:
https://github.com/microsoft/kernel-memory