agentscope icon indicating copy to clipboard operation
agentscope copied to clipboard

Reformat and improve RAG module and agents

Open ZiTao-Li opened this issue 1 year ago • 1 comments

Description

Updates

Changes on code structure

  • migrate and reformat RAG/knowledge module(s) and RAG agent(s) from examples to a module in src
  • add llama-index as rag_requires in setup.py

Changes on the RAG agent module

  • be compatible with the new KnowledgeBank feature
  • the configurations for the RAG-related functionalities are relocated back to knowledge modules
  • the retrieve method merges the retrievers from the KnowledgeBank members

Changes on the RAG/knowledge module

  • Rename the RAG modules to Knowledge (e.g., LlamaIndexRAG -> LlamaIndexKnowledge)
  • store and persist processed embeddings/indices/documents
  • support loading multiple doc types and dirs for one index
  • support docs management in the obtained (persisted) index
  • add a refresh function to update the index when needed
  • enable agents to reset or add new retrievers

Improving utility of knowledge module

  • reformat easy-to-use knowledge module config: the new format only configure the KnowledgeBank
  • introduce KnowledgeBank:
    • KnowledgeBank provides an easier way to initialize a knowledge object, just call add_data_as_knowledge with knowledge_id (a string as the identifier for this knowledge object), emb_model_name (the name of the embedding model config) and data_dirs_and_types (a dictionary of data directories and the wanted file extensions). As shown in the rag_example.py
       knowledge_bank.add_data_as_knowledge(
          knowledge_id="agentscope_tutorial_rag",
          emb_model_name="qwen_emb_config",
          data_dirs_and_types={
              "../../docs/sphinx_doc/en/source/tutorial": [".md"],
          },
      )
      
    • Knowledge objects in KnowledgeBank can be shared and duplicated by multiple agents, which can avoid embedding duplicated documents.
    • RAG agents can load multiple Knowledge objects (based on the "knowledge_id" in knowledge_config.json) with associated retrievers to perform multi-source information retrieval. Just need to pass the agent into KnowledgeBank.equip function.

Toturial

Both English and Chinese tutorial are added as 209-rag.md .


Checklist

Please check the following items before code is ready to be reviewed.

  • [x] Code has passed all tests
  • [x] Docstrings have been added/updated in Google Style
  • [x] Documentation has been updated
  • [x] Code is ready for review

ZiTao-Li avatar Apr 28 '24 03:04 ZiTao-Li

@ZiTao-Li Is this PR ready for review?

DavdGao avatar May 09 '24 13:05 DavdGao

The ImportError of LlamaIndex library is still exposed to users who don't use RAG module.

DavdGao avatar Jun 09 '24 09:06 DavdGao