datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Adding support for netCDF (*.nc) files

Open shermansiu opened this issue 2 years ago • 3 comments

Feature request

netCDF (*.nc) is a file format for storing multidimensional scientific data, which is used by packages like xarray (labelled multi-dimensional arrays in Python). It would be nice to have native support for netCDF in datasets.

Motivation

When uploading *.nc files onto Huggingface Hub through the datasets API, I would like to be able to preview the dataset without converting it to another format.

Your contribution

I can submit a PR, provided I have the time.

shermansiu avatar Dec 27 '23 09:12 shermansiu

Related to #3113

shermansiu avatar Dec 27 '23 20:12 shermansiu

Conceptually, we can use xarray to load the netCDF file, then xarray -> pandas -> pyarrow.

shermansiu avatar Dec 27 '23 20:12 shermansiu

I'd still need to verify that such a conversion would be lossless, especially for multi-dimensional data.

shermansiu avatar Dec 27 '23 20:12 shermansiu