datachain icon indicating copy to clipboard operation
datachain copied to clipboard

Load from / to Hugging Face ?

Open lhoestq opened this issue 1 year ago • 2 comments

Hi ! I'm Quentin from HF :)

Congrats on the release ! The API is concise and easy, it will be useful to many people

I was wondering if you had plans to support reading / writing from HF datasets ?

If you use fsspec it might work out of the box though, using hf:// paths (and if you have the huggingface_hub lib installed)

lhoestq avatar Aug 05 '24 10:08 lhoestq

@lhoestq thank you for the asking this! Do yo have a specific use case in mind? What exact dataset, and what you want to do with the dataset next?

Let us evaluate this. It seems straightforward and our other project DVC already supports this. We will get back soon.

dmpetrov avatar Aug 05 '24 15:08 dmpetrov

Cool ! The main uses cases I imagine are transforming rows / generating more rows with a LLM of existing datasets

lhoestq avatar Aug 12 '24 10:08 lhoestq

This article may be helpful for a future structured export function: https://huggingface.co/docs/datasets/en/repository_structure

dtulga avatar Oct 29 '24 18:10 dtulga

I think this can be closed for now. We have basic hf:// support. Not sure if we want more complicated integration with Datasets. Can be a separate item.

shcheklein avatar Jan 05 '25 20:01 shcheklein