Hosting the Dolly dataset on the Hugging Face Hub
Hi Databricks team, this is a really cool project and great job creating a high quality instruction dataset with a permissive license!
Would you be interested in hosting the Dolly dataset on the Hugging Face Hub (https://huggingface.co/datasets)? The Alpaca dataset is also hosted there (link) and your version would be of wide interest to the community, especially since many people have noted the ToS issues with training on OpenAI model outputs
If that sounds interesting, you just need to create a new dataset repo under the DataBricks org and upload your databricks-dolly-15k.jsonl file through the UI. For more details, you can check out our docs here
@matthayes I see the dataset is even there, just not turned loose yet - is it meant to be private?
I asked, and looks like the plan is to just host it in Github here, for now.
Scratch that - it's up now: https://huggingface.co/datasets/databricks/databricks-dolly-15k