dolly icon indicating copy to clipboard operation
dolly copied to clipboard

Hosting the Dolly dataset on the Hugging Face Hub

Open lewtun opened this issue 2 years ago • 2 comments

Hi Databricks team, this is a really cool project and great job creating a high quality instruction dataset with a permissive license!

Would you be interested in hosting the Dolly dataset on the Hugging Face Hub (https://huggingface.co/datasets)? The Alpaca dataset is also hosted there (link) and your version would be of wide interest to the community, especially since many people have noted the ToS issues with training on OpenAI model outputs

If that sounds interesting, you just need to create a new dataset repo under the DataBricks org and upload your databricks-dolly-15k.jsonl file through the UI. For more details, you can check out our docs here

lewtun avatar Apr 15 '23 14:04 lewtun

@matthayes I see the dataset is even there, just not turned loose yet - is it meant to be private?

srowen avatar Apr 15 '23 15:04 srowen

I asked, and looks like the plan is to just host it in Github here, for now.

srowen avatar Apr 18 '23 23:04 srowen

Scratch that - it's up now: https://huggingface.co/datasets/databricks/databricks-dolly-15k

srowen avatar Apr 21 '23 15:04 srowen