Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Export message tree dataset from the database as .jsonl.gz

Open andreaskoepf opened this issue 3 years ago • 1 comments

  • Define a JSON file format with one message-tree per line and nested children (see example below)
  • find python code for writing jsonl.gz files (streamed compression could be made optional), e.g. something like this
  • Create an export function which takes a list of message_tree_ids to export as parameter and a file object to write to. By default export only messages that are reviewed (review_result=True) and aren't marked as deleted (deleted=False), e.g. define parameters with default values like include_deleted=False, only_reviewed=True.
  • Write a function to export all message-trees that are in READY_FOR_EXPORT state
  • Write a function to export all messages of a user
  • add a cli command (e.g. add to the backend's main.py) to export all READY_FOR_EXPORT message trees into a specified file (or STDOUT if no file name is specified)

Example for possible JSON format (shown formatted for readability):

{
    "message_tree_id": "02193369-eccc-453b-b47f-1a5f9ec35fce",
    "prompt": {
        "message_id": "02193369-eccc-453b-b47f-1a5f9ec35fce",
        "text": "Hey assistant, how are you?",
        "role": "prompter",
        "rank": null,
        "review_count": 4,
        "replies": [
            {
                "message_id": "6a3aa62c-2309-4489-a605-6f164b0edfaa",
                "text": "Hi didily ho, here you go.",
                "role": "assistant",
                "rank": 0,
                "review_count": 3,
                "replies": [
                    {
                        "message_id": "0c31f96f-4c19-4dc2-9ed2-ac067068cf7a",
                        "text": "Should I buy GME stocks?",
                        "role": "prompter",
                        "rank": null,
                        "replies": []
                    }
                ]
            },
            {
                "message_id": "6960a835-a303-4c69-a45d-32dd93e8ea3d",
                "text": "Fine, how are you?",
                "role": "assistant",
                "rank": 1,
                "review_count": 3,
                "replies": [
                    {
                        "message_id": "ddcfef23-c735-4645-9126-a4c38f5bfec9",
                        "text": "Can you list all presidents of the United States in chronological order please?",
                        "role": "prompter",
                        "rank": null,
                        "replies": []
                    }
                ]
            }
        ]
    }
}

andreaskoepf avatar Jan 13 '23 08:01 andreaskoepf

@andreaskoepf Happy to pick this up 🙏

GraemeHarris avatar Jan 13 '23 08:01 GraemeHarris