xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Public API for setting new indexes: add a set_xindex method?

Open benbovy opened this issue 3 years ago • 5 comments

What is your issue?

xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section).

The scipy22 branch contains the addition of a .set_xindex() method to DataArray and Dataset so that participants at the SciPy 2022 Xarray sprint could experiment with custom indexes. After thinking more about it, I'm wondering if it couldn't actually be part of Xarray's public API alongside .set_index() (at least for a while).

  • Having two methods .set_xindex() vs. .set_index() would be quite consistent with the .xindexes vs. .indexes properties that are already there.

  • I actually like the .set_xindex() API proposed in the scipy22, i.e., setting one index at a time from one or more coordinates, possibly with build options. While it could be possible to support both that and .set_index()'s current API (quite specific to pandas multi-indexes) all in one method, it would certainly result in a much more confusing API and internal implementation.

  • In the long term we could progressively get rid of .indexes and .set_index() and/or rename .xindexes to .indexes and .set_xindex() to .set_index().

Thoughts @pydata/xarray?

benbovy avatar Jul 29 '22 12:07 benbovy

sounds good!

It would probably be good to figure out a way to extend it to accept multiple indexes at once, though, or have another method that allows that (if we think that set_xindex might support setting multiple indexes we should make sure the API can be extended).

keewis avatar Jul 29 '22 16:07 keewis

This sounds great to me!

I don't think we need support for setting multiple indexes at once in a single method call. You can call set_xindex multiple times for that if needed.

shoyer avatar Jul 29 '22 17:07 shoyer

You can call set_xindex multiple times for that if needed.

I guess that depends on how many indexes you'd want to set at once? If that number is constant and small (like, 1-3) then that would be fine for me, but for anything else a bulk API would be great.

keewis avatar Jul 29 '22 17:07 keewis

Hmm I'd rather expect that in most cases max. 2-3 (meta-)indexes are set, each from possibly a larger number of coordinates (e.g., 2-d staggered grid), but I could be wrong.

One exception could be setting (many) PandasIndex instances, one for each 1-d non-dimension coordinate of a Dataset. Maybe we could adapt .set_index() to support that case? Or later add a convenient method that "extends" default indexes to all 1-d coordinates if it turns out to be useful in many cases?

Another example would be to replace all default (pandas) indexes by Pint indexes. I guess that a convenient method accessible via the pint accessor would do the job well in that case?

benbovy avatar Jul 29 '22 19:07 benbovy

I agree, I think only setting a few indexes at a time would be normal. If we eventually need convenience methods for setting multiple indexes we can add those later.

shoyer avatar Jul 29 '22 20:07 shoyer