Dtype / checking standardization
In the current specification, there are not standardized data type objects, or a specification as to what a data type object needs to implement: https://data-apis.org/array-api/latest/API_specification/data_types.html
Additionally, there's no APIs in the specification for checking dtypes, i.e. something like np.issubdtype that is somewhat commonly used in code that would look something like:
def dispatch_based_on_dtype(array):
if np.issubdtype(array.dtype, np.integer):
return ...
elif np.issubdtype(array.dtype, np.floating):
return ...
else:
return ...
cc @jakirkham as I believe this pattern is used quite a bit in Dask which would presumably want to be able to target arbitrary array objects under the hood
With NumPy, one can do this. However I don't think the dtype spec includes type or the ability to make this kind of check AFAICT
In [1]: from numbers import Integral
In [2]: import numpy as np
In [3]: a = np.arange(5)
In [4]: issubclass(a.dtype.type, Integral)
Out[4]: True
In [1]: from numbers import Real
In [2]: import numpy as np
In [3]: a = np.arange(5).astype(float)
In [4]: issubclass(a.dtype.type, Real)
Out[4]: True
https://data-apis.org/array-api/latest/API_specification/data_types.html says
Data types (“dtypes”) are objects that can be used as dtype specifiers in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification.
(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)
I will point out that if you only care about the dtypes in the spec, there are a finite number of them, so you can use
def is_floating(dtype):
return dtype in [float32, float64]
def is_integral(dtype):
return dtype in [int8, ...]
(I had thought that somewhere it was required that dtype objects be comparable by
==, but I don't see that there presently)
Sounds like something we should add.
By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.
The docs weren't clear to me, but I didn't interpret those as singletons and instead thought that it was just guidance on what data types need to be supported by the standard.
The dtype names need to be part of the namespace. Otherwise passing dtype=float64 and so on won't be possible. If that isn't clear, we should fix it.
Sounds like something we should add. By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.
isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).
FWIW numbers objects are ABC's so we can just call register on each subclass to add float32 and float64 if subclassing is not an option
isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (
isinstance(np.array([1.], dtype=np.float64), np.float64)gives False).
@asmeurer I think you misread. If I wanna do isinstance(x, float32) on x, it must be a dtype object like float32, float64, etc. It doesn't make sense to check if an array is an instance of a dtype.
Right this is also why in the OP it mentions that with NumPy, we need to dtype.type today. There's no equivalent of this in the spec AFAICT
So IIUC the current standard does not specify what property or method a dtype object needs to implement:
https://github.com/data-apis/array-api/blob/09410675703ee275e6e5746db3aded0d37bc4b96/spec/API_specification/data_types.md#L74
so for now I think it is a no-go to check dtype.type. We need either the equal check (==) or something equivalent added or the standard revised I think.
(And also it doesn't seem that subclassing is an option, depending on how one interprets the standard.)
Reopening this issue as relevant to the proposal in #425.
All the discussion is happening in gh-425 and this is basically a duplicate issue, so I'll close this again. And we'll try to finalize gh-425 very soon.