kerchunk icon indicating copy to clipboard operation
kerchunk copied to clipboard

added the reinflate api

Open Anu-Ra-g opened this issue 1 year ago • 6 comments

Added the reinflate function to reinflate the index.

Anu-Ra-g avatar Aug 30 '24 16:08 Anu-Ra-g

Separate module and only exposing public functions sounds good to me.

emfdavid avatar Sep 04 '24 13:09 emfdavid

Is there a better name for this function? I am terrible at names...

emfdavid avatar Sep 04 '24 13:09 emfdavid

Is there a better name for this function?

You mean "reinflate"? Maybe if you write a 2-sentence description, the name will present itself :)

martindurant avatar Sep 04 '24 14:09 martindurant

@martindurant what should be the module name? The module should be included in a sub-package like kerchunk.grib.<module> or directly in the kerchunk.<module name>?

Anu-Ra-g avatar Sep 04 '24 14:09 Anu-Ra-g

I suggested _grib_idx above. Making kerchunk.grib into a package sounds like a bad idea.

martindurant avatar Sep 04 '24 14:09 martindurant

Comments on naming (not my strong suit)... the Camus team had suggested "outbasing" as the name for this general approach of ingesting an index of chunks to a database and constructing the dataset/tree on the fly.

The "reinflate" method in particular builds the refspec dictionary for a zarr/xarray tree representing a set of grib data. The inputs are the static metadata for the refspec (everything but the chunks for variables with a time dimension), a set of chunk indexes for specific variables (usually stored in a DB), the aggregation type, and the set of axes that define how to arrange those chunks.

I wanted to be general, supporting all the FMRC slices which are represented in the AggregationType enum. The api allows passing a list of axes such that you can express a set of runtimes, a set of valid times, a set of horizons, or the 'best available' as of a certain time.

I hope there is some simpler expression of these concepts that might use this method as a backend, but I suspect you would have to make assumptions about the backend database/schema to wrap it up nicely.

emfdavid avatar Sep 04 '24 17:09 emfdavid

  • This code should be in _grib_idx.py, as discussed
  • There should be tests
  • There needs to be far more documentation on how to use the code (again), else no one will ever touch these functions.

martindurant avatar Oct 01 '24 14:10 martindurant

Thank you for getting the process started. I will plan to make progress this month, but it will be a nights and weekends effort. If that is not progressing by the end of October we can regroup.

emfdavid avatar Oct 01 '24 14:10 emfdavid

Sure, I was simply book-keeping :)

martindurant avatar Oct 01 '24 14:10 martindurant

@emfdavid I was trying to figure out where the 2024 GSoC work stands, and it looks like this is the blocker? Is that correct?

rsignell avatar Oct 28 '24 13:10 rsignell

Check out #523 - I think that's intended to finish off the topic.

martindurant avatar Oct 28 '24 13:10 martindurant

@martindurant ooh, yes indeed, that looks like it! Thanks!

rsignell avatar Oct 28 '24 13:10 rsignell

Can we close this now that #523 is merged?

emfdavid avatar Nov 27 '24 21:11 emfdavid