Add "unique()" method, mimicking pandas
Would it be good to add a unique() method that mimics pandas?
import pandas as pd
import xarray as xr
pd.Series([0, 1, 1, 2]).unique()
xr.DataArray([0, 1, 1, 2]).unique() # not implemented
Output:
array([0, 1, 2])
AttributeError: 'DataArray' object has no attribute 'unique'
What would .unique() return on xarray.DataArray? For consistency with pandas, I guess it would return a 1D numpy or dask array?
I don't see a lot of value in adding this to xarray, given that all the xarray metadata gets lost by the unique() operation. You might as well just write np.unique(my_data_array.data).
Right, it would return a 1D numpy or dask array.
I suppose I'm used to simply typing pd.Series().unique() rather than np.unique(pd.Series()).
I use it in for loops primarily.
for season in da['time.season'].unique():
vs
for season in np.unique(da['time.season'].data):
Hi, I also vote for this function, My typical use-case.
There is some structure in 3D space and I need to "flatten it" to 2D. Let us say it is axially symetric so I assign R and Z coordinate to points (or r and theta in polar). And I want to simplify this using interp; however, it requuires unique coordinates.
I have some solution here: https://stackoverflow.com/questions/51058379/drop-duplicate-times-in-xarray
and adapted this into actuall function:
def distribure_uniform(ds, N_points=512):
ds_theta = ds.sortby("theta").swap_dims({"idx": "theta"})
_, index = np.unique(ds_theta['theta'], return_index=True)
ds_theta = ds_theta.isel(theta=index)
ds_theta = ds_theta.interp(
theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))
ds_theta = ds_theta.swap_dims({"theta": "idx"})
return ds_theta
In an idal case I would like to write something like this:
def distribure_uniform(ds, N_points=512):
ds_theta= ds.unique("theta", sorted=False, sort=True)
ds_theta = ds_theta.swap_dims({"idx": "theta"})
ds_theta = ds_theta.interp(
theta=np.linspace(ds.theta.min(), ds.theta.max(), N_points))
ds_theta = ds_theta.swap_dims({"theta": "idx"})
return ds_theta
A case I ran into where supporting .unique() in the pandas sense would be helpful is when an object dtype is used to support nullable strings:
>>> ar = xr.DataArray(np.array(['foo', np.nan], dtype='object'), coords={'bar': range(2)}, name='foo')
>>> np.unique(ar.data)
TypeError: '<' not supported between instances of 'float' and 'str'
>>> ar.to_dataframe().foo.unique()
array(['foo', nan], dtype=object)
Actually, pd.unique(ar) also works fine here, so maybe there's no need to add it to xarray.
I guess the limitation on using pd.unique() is that it requires 1D data. pd.unique(ar.data.flatten()) isn't so painful, but that feels like the kind of thing xarray should do for you.