codebook icon indicating copy to clipboard operation
codebook copied to clipboard

Discuss data management solutions

Open patrickmineault opened this issue 4 years ago • 1 comments

From a reader:

One issue that pops up frequently - data "versioning". Where you go through a series of data manipulations/cleaning until a fi al clean set that you use for manuscript analysis. Sometimes the path between the very raw acquired data to that clean data set is very poorly documented.

Some solutions:

  • datalad
  • dvc
  • git-annex
  • git-lfs

Also discuss sharing these versions, e.g. through dryad, figshare, OSF, etc.

patrickmineault avatar Jan 15 '22 01:01 patrickmineault

A similar, modern approach could also include ice chunk (to version cloud-optimized raster datasets stored in Zarr): https://github.com/earth-mover/icechunk

CC: @TomNicholas

alxmrs avatar Jul 25 '25 16:07 alxmrs