Error when using `apply_ufunc` with `datetime64` as output dtype
What happened?
When using apply_ufunc with datetime64[ns] as output dtype, code throws error about converting from specific units to generic datetime units.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
def _fn(arr: np.ndarray, time: np.ndarray) -> np.ndarray:
return time[:10]
def fn(da: xr.DataArray) -> xr.DataArray:
dim_out = "time_cp"
return xr.apply_ufunc(
_fn,
da,
da.time,
input_core_dims=[["time"], ["time"]],
output_core_dims=[[dim_out]],
vectorize=True,
dask="parallelized",
output_dtypes=["datetime64[ns]"],
dask_gufunc_kwargs={"allow_rechunk": True,
"output_sizes": {dim_out: 10},},
exclude_dims=set(("time",)),
)
da_fake = xr.DataArray(np.random.rand(5,5,5),
coords=dict(x=range(5), y=range(5),
time=np.array(['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'], dtype='datetime64[ns]')
)).chunk(dict(x=2,y=2))
fn(da_fake.compute()).compute() # ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas
fn(da_fake).compute() # same errors as above
MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[211], line 1
----> 1 fn(da_fake).compute()
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1163, in DataArray.compute(self, **kwargs)
1144 """Manually trigger loading of this array's data from disk or a
1145 remote source into memory and return a new array. The original is
1146 left unaltered.
(...)
1160 dask.compute
1161 """
1162 new = self.copy(deep=False)
-> 1163 return new.load(**kwargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataarray.py:1137, in DataArray.load(self, **kwargs)
1119 def load(self, **kwargs) -> Self:
1120 """Manually trigger loading of this array's data from disk or a
1121 remote source into memory and return this array.
1122
(...)
1135 dask.compute
1136 """
-> 1137 ds = self._to_temp_dataset().load(**kwargs)
1138 new = self._from_temp_dataset(ds)
1139 self._variable = new._variable
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/dataset.py:853, in Dataset.load(self, **kwargs)
850 chunkmanager = get_chunked_array_type(*lazy_data.values())
852 # evaluate all the chunked arrays simultaneously
--> 853 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs)
855 for k, data in zip(lazy_data, evaluated_data):
856 self.variables[k].data = data
File /srv/conda/envs/notebook/lib/python3.10/site-packages/xarray/core/daskmanager.py:70, in DaskManager.compute(self, *data, **kwargs)
67 def compute(self, *data: DaskArray, **kwargs) -> tuple[np.ndarray, ...]:
68 from dask.array import compute
---> 70 return compute(*data, **kwargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/dask/base.py:628, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
625 postcomputes.append(x.__dask_postcompute__())
627 with shorten_traceback():
--> 628 results = schedule(dsk, keys, **kwargs)
630 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2372, in vectorize.__call__(self, *args, **kwargs)
2369 self._init_stage_2(*args, **kwargs)
2370 return self
-> 2372 return self._call_as_normal(*args, **kwargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2365, in vectorize._call_as_normal(self, *args, **kwargs)
2362 vargs = [args[_i] for _i in inds]
2363 vargs.extend([kwargs[_n] for _n in names])
-> 2365 return self._vectorize_call(func=func, args=vargs)
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2446, in vectorize._vectorize_call(self, func, args)
2444 """Vectorized call to `func` over positional `args`."""
2445 if self.signature is not None:
-> 2446 res = self._vectorize_call_with_signature(func, args)
2447 elif not args:
2448 res = func()
File /srv/conda/envs/notebook/lib/python3.10/site-packages/numpy/lib/function_base.py:2506, in vectorize._vectorize_call_with_signature(self, func, args)
2502 outputs = _create_arrays(broadcast_shape, dim_sizes,
2503 output_core_dims, otypes, results)
2505 for output, result in zip(outputs, results):
-> 2506 output[index] = result
2508 if outputs is None:
2509 # did not call the function even once
2510 if otypes is None:
ValueError: Cannot convert from specific units to generic units in NumPy datetimes or timedeltas
Anything else we need to know?
No response
Environment
For me the first line (fn(da_fake.compute()).compute()) already throws the error. What numpy version are you using?
My bad, I was using a slightly old version of numpy, with a fresh upgraded environment I can confirm the error occurs also with non-chunked arrays. I'll edit the issue's description.
No worries. This might be a numpy bug. This is a pure numpy repro:
import numpy as np
otype = "datetime64[ns]"
arr = np.array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[ns]')
np.vectorize(lambda x: x, signature="(i)->(j)", otypes=[otype])(arr)
Internally numpy creates a target array with dtype=np.dtype(otype).char:
out = np.empty(3, dtype="M")
out[:] = arr
See https://github.com/numpy/numpy/blob/8f22d5aea1516c7228232988e015ff217a6c7c4a/numpy/lib/_function_base_impl.py#L2333
I assume you example is simplified but would one of these options work for you?
- pass
vectorize=False? - not pass
output_dtypes? - pass
dask="allowed"? - convert to datatime after the computation (
np.vectorize(lambda x: x, signature="(i)->(j)", otypes=[int])(arr).astype("datetime64[ns]")) (make sure to avoid overflows)? - passing
metadoes not seem to work either (i.e.dask_gufunc_kwargs={"meta": np.array([], dtype='datetime64[ns]')})
Thanks for digging into this! In my case the easiest solution would be using another dtype and then converting to datetime after, as you suggested. I've opened an issue in the numpy repository for this bug.
Seems like this if fixed upstream. At least @mathause's pure numpy reproducer works with latest numpy. Please reopen, if still relevant.