datasets icon indicating copy to clipboard operation
datasets copied to clipboard

[GH->HF] Part 2: Remove all dataset scripts from github

Open lhoestq opened this issue 3 years ago • 3 comments

Now that all the datasets live on the Hub we can remove the /datasets directory that contains all the dataset scripts of this repository

Needs https://github.com/huggingface/datasets/pull/4973 to be merged first and PR to be enabled on the Hub for non-namespaced datasets

lhoestq avatar Sep 13 '22 16:09 lhoestq

The documentation is not available anymore as the PR was closed or merged.

So this means metrics will be deleted from this repo in favor of the "evaluate" library? Maybe you guys could just redirect metrics to that library.

osbm avatar Sep 18 '22 18:09 osbm

We are deprecating the metrics in datasets indeed and suggest users to switch to evaluate (via a warning message)

We'll keep the current metrics as they are for now, but they'll be completely removed at one point

lhoestq avatar Sep 19 '22 10:09 lhoestq

I guess this is ready to merge ?

It should break nothing except one rare case:

If someone is using an old version of datasets to try to load a recent dataset. Indeed in that case it fetches the main branch on github to see if it exists. But since we're removing all the datasets, forward fetching won't work anymore.

e.g. if someone uses "imagenet-1k" with a version of datasets that didn't have it at that time. I checked on kibana and one single user would be affected with 4k downloads/months. It should still work for them though thanks to the datasets cache

But if they delete their cache, the workaround is... 🥁 update datasets 😅

lhoestq avatar Sep 22 '22 13:09 lhoestq

Let's merge this on monday if we can, to make sure contributors who wanted to merge their dataset PRs here could do it

lhoestq avatar Sep 30 '22 08:09 lhoestq

Alright, merging !

lhoestq avatar Oct 03 '22 17:10 lhoestq