xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Error when using `apply_ufunc` with `datetime64` as output dtype

Open gcaria opened this issue 2 years ago • 4 comments

What happened?

When using apply_ufunc with datetime64[ns] as output dtype, code throws error about converting from specific units to generic datetime units.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray:
    return time[:10]

def fn(da: xr.DataArray) -> xr.DataArray:
    dim_out = "time_cp"

    return xr.apply_ufunc(
        _fn,
        da,
        da.time,
        input_core_dims=[["time"], ["time"]],
        output_core_dims=[[dim_out]],
        vectorize=True,
        dask="parallelized",
        output_dtypes=["datetime64[ns]"],
        dask_gufunc_kwargs={"allow_rechunk": True, 
                            "output_sizes": {dim_out: 10},},
        exclude_dims=set(("time",)),
    )

da_fake = xr.DataArray(np.random.rand(5,5,5),
                     coords=dict(x=range(5), y=range(5),
                                time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',  '2024-01-05'], dtype='datetime64[ns]')
                                )).chunk(dict(x=2,y=2))

fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas

fn(da_fake).compute() # same errors as above

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[211], line 1
----> 1 fn(da_fake).compute()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, **kwargs)
   1144 """Manually trigger loading of this array's data from disk or a
   1145 remote source into memory and return a new array. The original is
   1146 left unaltered.
   (...)
   1160 dask.compute
   1161 """
   1162 new = self.copy(deep=False)
-> 1163 return new.load(**kwargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, **kwargs)
   1119 def load(self, **kwargs) -> Self:
   1120     """Manually trigger loading of this array's data from disk or a
   1121     remote source into memory and return this array.
   1122 
   (...)
   1135     dask.compute
   1136     """
-> 1137     ds = self._to_temp_dataset().load(**kwargs)
   1138     new = self._from_temp_dataset(ds)
   1139     self._variable = new._variable

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, **kwargs)
    850 chunkmanager = get_chunked_array_type(*lazy_data.values())
    852 # evaluate all the chunked arrays simultaneously
--> 853 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs)
    855 for k, data in zip(lazy_data, evaluated_data):
    856     self.variables[k].data = data

File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, *data, **kwargs)
     67 def compute(self, *data: DaskArray, **kwargs) -> tuple[np.ndarray, ...]:
     68     from dask.array import compute
---> 70     return compute(*data, **kwargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    625     postcomputes.append(x.__dask_postcompute__())
    627 with shorten_traceback():
--> 628     results = schedule(dsk, keys, **kwargs)
    630 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.__call__(self, *args, **kwargs)
   2369     self._init_stage_2(*args, **kwargs)
   2370     return self
-> 2372 return self._call_as_normal(*args, **kwargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, *args, **kwargs)
   2362     vargs = [args[_i] for _i in inds]
   2363     vargs.extend([kwargs[_n] for _n in names])
-> 2365 return self._vectorize_call(func=func, args=vargs)

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args)
   2444 """Vectorized call to `func` over positional `args`."""
   2445 if self.signature is not None:
-> 2446     res = self._vectorize_call_with_signature(func, args)
   2447 elif not args:
   2448     res = func()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args)
   2502         outputs = _create_arrays(broadcast_shape, dim_sizes,
   2503                                  output_core_dims, otypes, results)
   2505     for output, result in zip(outputs, results):
-> 2506         output[index] = result
   2508 if outputs is None:
   2509     # did not call the function even once
   2510     if otypes is None:

ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas

Anything else we need to know?

No response

Environment

gcaria avatar Mar 01 '24 15:03 gcaria

For me the first line (fn(da_fake.compute()).compute()) already throws the error. What numpy version are you using?

mathause avatar Mar 03 '24 08:03 mathause

My bad, I was using a slightly old version of numpy, with a fresh upgraded environment I can confirm the error occurs also with non-chunked arrays. I'll edit the issue's description.

gcaria avatar Mar 03 '24 10:03 gcaria

No worries. This might be a numpy bug. This is a pure numpy repro:

import numpy as np
otype = "datetime64[ns]"
arr = np.array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[ns]')
np.vectorize(lambda x: x, signature="(i)->(j)", otypes=[otype])(arr)

Internally numpy creates a target array with dtype=np.dtype(otype).char:

out = np.empty(3, dtype="M")
out[:] = arr

See https://github.com/numpy/numpy/blob/8f22d5aea1516c7228232988e015ff217a6c7c4a/numpy/lib/_function_base_impl.py#L2333


I assume you example is simplified but would one of these options work for you?

  • pass vectorize=False?
  • not pass output_dtypes?
  • pass dask="allowed"?
  • convert to datatime after the computation (np.vectorize(lambda x: x, signature="(i)->(j)", otypes=[int])(arr).astype("datetime64[ns]")) (make sure to avoid overflows)?
  • passing meta does not seem to work either (i.e. dask_gufunc_kwargs={"meta": np.array([], dtype='datetime64[ns]')})

mathause avatar Mar 03 '24 21:03 mathause

Thanks for digging into this! In my case the easiest solution would be using another dtype and then converting to datetime after, as you suggested. I've opened an issue in the numpy repository for this bug.

gcaria avatar Mar 05 '24 09:03 gcaria

Seems like this if fixed upstream. At least @mathause's pure numpy reproducer works with latest numpy. Please reopen, if still relevant.

kmuehlbauer avatar Jul 31 '24 07:07 kmuehlbauer