PyPRECIS icon indicating copy to clipboard operation
PyPRECIS copied to clipboard

Rationalise & organise data for notebooks

Open hsteptoe opened this issue 7 years ago • 2 comments

The data directory associated with the notebooks /project/precis/worksheets/data needs rationalising. The aim is to eventually add this to the github repo via git Large File Store (git lfs) so the smaller we can make it the better.

Some things to check:

  • [x] Do we need pmsl model data? (Double check it isn't used in any of the exercises)
  • [ ] CRU data is the largest single dataset. Can we reduce the size? Do we need to the entire dataset? Can we reduce the area?
  • [ ] Would separate directories for variables be better than separate directories for experiments?
  • [ ] Improve the data processing workflow so that we don't save rim removed nc versions of each individual pp file. I can't see any good reason for doing this - relates to comments in #23

hsteptoe avatar Nov 07 '18 09:11 hsteptoe

Current data directory structure is:

data/
|-- APHRODITE
|   |-- aphro.mon.6190.nc
|   `-- aphro.pm.6190.KL.nc
|-- CRU
|   `-- cru.pm.6190.03236.nc
|-- climatology
|   |-- climatology netcdfs, eg...
|   |-- aphro.a.OND.mean.baseline.pr.mmday-1.nc
|   |-- cahpb.monmean.1981_1983.pr.norim.mmday-1.KL.nc
|   `-- ...
|-- netcdf
|   |-- cahpa
|        |-- 03236
|        |-- 05216
|        |-- 16222
|        `-- + merged non-climatology netcdf files
|   `-- cahpb
|        |-- 03236
|        |-- 05216
|        |-- 16222
|        `-- + merged non-climatology netcdf files
|-- pp
|   |-- cahpa
|        `-- raw pp files
|   `-- cahpb
|        `-- raw pp files
`-- sample_data.nc

hsteptoe avatar Nov 08 '18 08:11 hsteptoe

Now removed the pmsl field from the *pmi*.pp files - these files now only contain air temperature and precipitation flux

hsteptoe avatar Jan 09 '19 14:01 hsteptoe