Make the dictionary parser much more flexible, and decouple task-specific information from DictDialogAgent

Open DavdGao opened this issue 1 year ago • 1 comments

Description

Motivation

Support to change the required fields in returned dictionary dynamically. For example, the same agent should respond a dictionary with "agreement" field in a discussion, but respond with "vote" field instead in voting process.
Decouple DictDialogAgent from special fields in returned dictionary. Current DictDialogAgent defaults the generated dictionary from LLM must has a "speak" field, which is task-specific.
Allow to filter generated dictionary when storing into memory, return to other agents and control the application workflow. For example, in response dictionary, some fields are (not) to be stored into memory, some fields should (not) be returned to other agents, and some fields are used to control the application workflows:

fake_parsed_response = {
    "thought": "xxx",
    "speak": "xxx", 
    "agreement": true/false,
}

self.speak(fake_parsed_response["speak"])                                    # only speak field
self.memory.add(fake_parsed_response)                                        # all fields
return Msg(
    self.name, 
    content=fake_parsed_response["speak"],                                           # only speak field in content
    role="assistant", 
    metadata=fake_parsed_response["agreement"]                                # only agreement field
)

Design

DictFilterMixin class: For parsers that return dictionary, we add a parent class DictFilterMixin, which has keys_to_speak/memory/return attributes, and to_memory, to_speak and to_return functions to filter the given dictionary.
In DictDialogAgent, a parser takes responsibility for
- generate format instruction ("You should respond in the following format ...")
- parse LLM response into a dictionary
- filter the parsed dictionary in self.speak, self.memory.add and return interface

The DictDialogAgent works as follows:

class DictDialogAgent(AgentBase):
    def __init__(self):
        # ...
        self.parser = None

    def reply(self):
        prompt = self.model.format(
            self.memory.get_memory(),
            self.parser.format_instruction
        )

        res = self.model(prompt, parse_func=self.parser.parse)
        
        self.memory.add(Msg(self.name, self.parser.to_memory(res.parsed), "assistant"))

        msg = Msg(
            self.name, 
            content=self.parser.to_content(res.parsed), 
            role="assistant", 
            metadata=self.parser.to_metadata(res.parsed)
        )
        self.speak(msg)

        return msg

In this way, when an agent needs to return different fields, developers only need to change its parsers as follows

agent = DictDialogAgent("assistant", "gpt-4")

# parser for discussion
discussion_parser = MarkdownJsonDictParser(
    content_hint={
        "speak": "xxx",
        "thought": "xxx",
        "end_discussion": true/false,
    },
    keys_to_memory=["speak", "thought"],
    keys_to_content="speak",
    keys_to_metadata=["end_discussion"]
)

# parser for vote
vote_parser = MarkdownJsonDictParser(
    content_hint={
        "thought": "xxx",
        "vote": "player1 or player2"
    },
    keys_to_memory=["thought", "vote"],
    keys_to_content="vote"
)

# discussion
agent.set_parser(discussion_parser)

while True:
    x = agent(x)
    if x.metadata["end_discussion"]:
        break

# vote
agent.set_parser(vote_parser)

while True:
    # vote ...

Checklist

Please check the following items before code is ready to be reviewed.

[x] Code has passed all tests
[x] Docstrings have been added/updated in Google Style
[x] Documentation has been updated
[x] Code is ready for review

May 09 '24 06:05 DavdGao

@qbc2016 please check if the DictDialogAgent can handle the werewolf game.

May 09 '24 07:05 DavdGao