Open-Assistant
Open-Assistant copied to clipboard
Export message tree dataset from the database as .jsonl.gz
- Define a JSON file format with one message-tree per line and nested children (see example below)
- find python code for writing jsonl.gz files (streamed compression could be made optional), e.g. something like this
- Create an export function which takes a list of
message_tree_idsto export as parameter and a file object to write to. By default export only messages that are reviewed (review_result=True) and aren't marked as deleted (deleted=False), e.g. define parameters with default values likeinclude_deleted=False,only_reviewed=True. - Write a function to export all message-trees that are in
READY_FOR_EXPORTstate - Write a function to export all messages of a user
- add a cli command (e.g. add to the backend's main.py) to export all
READY_FOR_EXPORTmessage trees into a specified file (or STDOUT if no file name is specified)
Example for possible JSON format (shown formatted for readability):
{
"message_tree_id": "02193369-eccc-453b-b47f-1a5f9ec35fce",
"prompt": {
"message_id": "02193369-eccc-453b-b47f-1a5f9ec35fce",
"text": "Hey assistant, how are you?",
"role": "prompter",
"rank": null,
"review_count": 4,
"replies": [
{
"message_id": "6a3aa62c-2309-4489-a605-6f164b0edfaa",
"text": "Hi didily ho, here you go.",
"role": "assistant",
"rank": 0,
"review_count": 3,
"replies": [
{
"message_id": "0c31f96f-4c19-4dc2-9ed2-ac067068cf7a",
"text": "Should I buy GME stocks?",
"role": "prompter",
"rank": null,
"replies": []
}
]
},
{
"message_id": "6960a835-a303-4c69-a45d-32dd93e8ea3d",
"text": "Fine, how are you?",
"role": "assistant",
"rank": 1,
"review_count": 3,
"replies": [
{
"message_id": "ddcfef23-c735-4645-9126-a4c38f5bfec9",
"text": "Can you list all presidents of the United States in chronological order please?",
"role": "prompter",
"rank": null,
"replies": []
}
]
}
]
}
}
@andreaskoepf Happy to pick this up 🙏