intake-xarray icon indicating copy to clipboard operation
intake-xarray copied to clipboard

set automatically coerce_shape for xarray_image

Open acocac opened this issue 4 years ago • 3 comments

I have a local directory with GeoTIFF files with different shapes. I've explored the coerce_shape parameter to define manually a certain shape. I'm wondered if there's a workaround to coerce all images according to the largest shape in the directory instead of defining it manually. The following lines show how I define the catalog:

[...]
sources:
  test1:
    driver: xarray_image
    args:
      urlpath: '{{ CATALOG_DIR }}/data/*.tif'
      coerce_shape: [400,400]
[...]

acocac avatar Jun 09 '21 17:06 acocac

Intake is not currently able to automatically investigate a set of data sources to derive a value for using in further data sources.

Two possible future routes that could implement the idea:

  • a new catalog type that has the ability to introspect a set of data prescriptions and update those prescriptions dynamically
  • extend the transforms idea to accept multiple data sources and go from there

martindurant avatar Jun 09 '21 17:06 martindurant

Thanks @martindurant for pointing the possible future routes. Both are valid for me.

When you say a new catalog able to instrospect a set of data, do you have any specific example?

It would be great to implement a lazy operation to retrieve image size e.g. PIL's Image.open (see here). However, I am not sure how effective this operation might result for a catalog with million of images.

acocac avatar Jun 09 '21 19:06 acocac

When you say a new catalog able to instrospect a set of data

Not really, this would be a new model. Catalogues have access to their child data sources of course, but it is not the normal pattern to try to access their internal metadata. As you say, this might be expensive. There are, however, lazy catalogues, where entries (the objects that make sources) are only created on request.

martindurant avatar Jun 09 '21 20:06 martindurant