Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Incorrect url in SFT prompt_dialogue dataset

Open maw501 opened this issue 3 years ago • 6 comments

As mentioned in #1368 the url to the prompt_dialogue dataset is broken. AFAICT the new url is here so we just need to change the link and check we can load the data using the new link(s).

maw501 avatar Feb 20 '23 17:02 maw501

Hi! I am new here and would like to contribute, can I be assigned this issue?

bethanyconnolly avatar Feb 20 '23 17:02 bethanyconnolly

Hi, the links for the datasets were not only wrong but the datasets had been deleted. I have retrieved them through the commit history and I would like to upload them as datasets to the Open-Assistant organisation on HuggingFace to avoid this issue in future. Please can someone approve my request to join on HuggingFace?

bethanyconnolly avatar Feb 21 '23 15:02 bethanyconnolly

Hi, the links for the datasets were not only wrong but the datasets had been deleted. I have retrieved them through the commit history and I would like to upload them as datasets to the Open-Assistant organisation on HuggingFace to avoid this issue in future. Please can someone approve my request to join on HuggingFace?

I am not sure who is in charge of the HF org. Maybe one of the ML leads @sanagno @theblackcat102?

olliestanley avatar Feb 21 '23 23:02 olliestanley

The incorrect url datasets will be removed in the PR#1793

theblackcat102 avatar Feb 22 '23 03:02 theblackcat102

The incorrect url datasets will be removed in the PR#1793

So we don't need these datasets at all any more and we can close this issue?

bethanyconnolly avatar Feb 23 '23 12:02 bethanyconnolly

Rallio moved his data here: https://github.com/LAION-AI/Open-Instruction-Generalist/tree/main/small_instruction_set

Now it's an official LAION-AI dataset. We can pull from here as you wish and have his blessing as well :) @theblackcat102

huu4ontocord avatar Feb 24 '23 06:02 huu4ontocord

The incorrect url datasets will be removed in the PR#1793 We can close this issue because the code containing the urls is removed in the above PR.

bethanyconnolly avatar Feb 27 '23 16:02 bethanyconnolly