xarray Add "unique()" method, mimicking pandas

Would it be good to add a unique() method that mimics pandas?

import pandas as pd
import xarray as xr
pd.Series([0, 1, 1, 2]).unique()
xr.DataArray([0, 1, 1, 2]).unique()  # not implemented

Output:

array([0, 1, 2])
AttributeError: 'DataArray' object has no attribute 'unique'

Feb 28 '19 18:02 ahuang11

What would .unique() return on xarray.DataArray? For consistency with pandas, I guess it would return a 1D numpy or dask array?

I don't see a lot of value in adding this to xarray, given that all the xarray metadata gets lost by the unique() operation. You might as well just write np.unique(my_data_array.data).

Mar 04 '19 07:03 shoyer

Right, it would return a 1D numpy or dask array.

I suppose I'm used to simply typing pd.Series().unique() rather than np.unique(pd.Series()).

I use it in for loops primarily. for season in da['time.season'].unique(): vs for season in np.unique(da['time.season'].data):

Mar 05 '19 00:03 ahuang11

Hi, I also vote for this function, My typical use-case.

There is some structure in 3D space and I need to "flatten it" to 2D. Let us say it is axially symetric so I assign R and Z coordinate to points (or r and theta in polar). And I want to simplify this using interp; however, it requuires unique coordinates.

I have some solution here: https://stackoverflow.com/questions/51058379/drop-duplicate-times-in-xarray

and adapted this into actuall function:

def distribure_uniform(ds, N_points=512):

    ds_theta = ds.sortby("theta").swap_dims({"idx": "theta"})
    _, index = np.unique(ds_theta['theta'], return_index=True)

    ds_theta = ds_theta.isel(theta=index)

    ds_theta = ds_theta.interp(
        theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))

    ds_theta = ds_theta.swap_dims({"theta": "idx"})
    return ds_theta

In an idal case I would like to write something like this:

def distribure_uniform(ds, N_points=512):

    ds_theta= ds.unique("theta", sorted=False, sort=True)

    ds_theta = ds_theta.swap_dims({"idx": "theta"})
    ds_theta = ds_theta.interp(
        theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))
    ds_theta = ds_theta.swap_dims({"theta": "idx"})
    return ds_theta

Oct 16 '20 11:10 kripnerl

A case I ran into where supporting .unique() in the pandas sense would be helpful is when an object dtype is used to support nullable strings:

>>> ar = xr.DataArray(np.array(['foo', np.nan], dtype='object'), coords={'bar': range(2)}, name='foo')
>>> np.unique(ar.data)
TypeError: '<' not supported between instances of 'float' and 'str'
>>> ar.to_dataframe().foo.unique()
array(['foo', nan], dtype=object)

Jan 08 '24 16:01 aaronsarna

Actually, pd.unique(ar) also works fine here, so maybe there's no need to add it to xarray.

Jan 08 '24 16:01 aaronsarna

I guess the limitation on using pd.unique() is that it requires 1D data. pd.unique(ar.data.flatten()) isn't so painful, but that feels like the kind of thing xarray should do for you.

Jan 08 '24 17:01 aaronsarna