Public API for setting new indexes: add a set_xindex method?
What is your issue?
xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section).
The scipy22 branch contains the addition of a .set_xindex() method to DataArray and Dataset so that participants at the SciPy 2022 Xarray sprint could experiment with custom indexes. After thinking more about it, I'm wondering if it couldn't actually be part of Xarray's public API alongside .set_index() (at least for a while).
-
Having two methods
.set_xindex()vs..set_index()would be quite consistent with the.xindexesvs..indexesproperties that are already there. -
I actually like the
.set_xindex()API proposed in thescipy22, i.e., setting one index at a time from one or more coordinates, possibly with build options. While it could be possible to support both that and.set_index()'s current API (quite specific to pandas multi-indexes) all in one method, it would certainly result in a much more confusing API and internal implementation. -
In the long term we could progressively get rid of
.indexesand.set_index()and/or rename.xindexesto.indexesand.set_xindex()to.set_index().
Thoughts @pydata/xarray?
sounds good!
It would probably be good to figure out a way to extend it to accept multiple indexes at once, though, or have another method that allows that (if we think that set_xindex might support setting multiple indexes we should make sure the API can be extended).
This sounds great to me!
I don't think we need support for setting multiple indexes at once in a single method call. You can call set_xindex multiple times for that if needed.
You can call set_xindex multiple times for that if needed.
I guess that depends on how many indexes you'd want to set at once? If that number is constant and small (like, 1-3) then that would be fine for me, but for anything else a bulk API would be great.
Hmm I'd rather expect that in most cases max. 2-3 (meta-)indexes are set, each from possibly a larger number of coordinates (e.g., 2-d staggered grid), but I could be wrong.
One exception could be setting (many) PandasIndex instances, one for each 1-d non-dimension coordinate of a Dataset. Maybe we could adapt .set_index() to support that case? Or later add a convenient method that "extends" default indexes to all 1-d coordinates if it turns out to be useful in many cases?
Another example would be to replace all default (pandas) indexes by Pint indexes. I guess that a convenient method accessible via the pint accessor would do the job well in that case?
I agree, I think only setting a few indexes at a time would be normal. If we eventually need convenience methods for setting multiple indexes we can add those later.