array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Dtype / checking standardization

Open kkraus14 opened this issue 4 years ago • 11 comments

In the current specification, there are not standardized data type objects, or a specification as to what a data type object needs to implement: https://data-apis.org/array-api/latest/API_specification/data_types.html

Additionally, there's no APIs in the specification for checking dtypes, i.e. something like np.issubdtype that is somewhat commonly used in code that would look something like:

def dispatch_based_on_dtype(array):
    if np.issubdtype(array.dtype, np.integer):
        return ...
    elif np.issubdtype(array.dtype, np.floating):
        return ...
    else:
        return ...

cc @jakirkham as I believe this pattern is used quite a bit in Dask which would presumably want to be able to target arbitrary array objects under the hood

kkraus14 avatar Mar 25 '21 18:03 kkraus14

With NumPy, one can do this. However I don't think the dtype spec includes type or the ability to make this kind of check AFAICT

In [1]: from numbers import Integral                                            

In [2]: import numpy as np                                                      

In [3]: a = np.arange(5)                                                        

In [4]: issubclass(a.dtype.type, Integral)                                      
Out[4]: True
In [1]: from numbers import Real                                                

In [2]: import numpy as np                                                      

In [3]: a = np.arange(5).astype(float)                                          

In [4]: issubclass(a.dtype.type, Real)                                          
Out[4]: True

jakirkham avatar Mar 25 '21 18:03 jakirkham

https://data-apis.org/array-api/latest/API_specification/data_types.html says

Data types (“dtypes”) are objects that can be used as dtype specifiers in functions and methods (e.g., zeros((2, 3), dtype=float32) ). A conforming implementation may add methods or attributes to data type objects; however, these methods and attributes are not included in this specification.

(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)

I will point out that if you only care about the dtypes in the spec, there are a finite number of them, so you can use

def is_floating(dtype):
    return dtype in [float32, float64]

def is_integral(dtype):
    return dtype in [int8, ...]

asmeurer avatar Apr 01 '21 22:04 asmeurer

(I had thought that somewhere it was required that dtype objects be comparable by ==, but I don't see that there presently)

Sounds like something we should add.

By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.

leofang avatar Apr 01 '21 23:04 leofang

The docs weren't clear to me, but I didn't interpret those as singletons and instead thought that it was just guidance on what data types need to be supported by the standard.

kkraus14 avatar Apr 02 '21 00:04 kkraus14

The dtype names need to be part of the namespace. Otherwise passing dtype=float64 and so on won't be possible. If that isn't clear, we should fix it.

Sounds like something we should add. By the way, did we specify that dtype objects are singletons so that they can also be checked via isinstance(x, float32)? It's probably a sane design that has been adopted in several libraries (eg: NumPy, CuPy), but still worth shouting out loud.

isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).

asmeurer avatar Apr 02 '21 20:04 asmeurer

FWIW numbers objects are ABC's so we can just call register on each subclass to add float32 and float64 if subclassing is not an option

jakirkham avatar Apr 02 '21 21:04 jakirkham

isinstance(x, float32) implies that x is a scalar, which we do not have in the spec (isinstance(np.array([1.], dtype=np.float64), np.float64) gives False).

@asmeurer I think you misread. If I wanna do isinstance(x, float32) on x, it must be a dtype object like float32, float64, etc. It doesn't make sense to check if an array is an instance of a dtype.

leofang avatar Apr 02 '21 21:04 leofang

Right this is also why in the OP it mentions that with NumPy, we need to dtype.type today. There's no equivalent of this in the spec AFAICT

jakirkham avatar Apr 02 '21 21:04 jakirkham

So IIUC the current standard does not specify what property or method a dtype object needs to implement: https://github.com/data-apis/array-api/blob/09410675703ee275e6e5746db3aded0d37bc4b96/spec/API_specification/data_types.md#L74 so for now I think it is a no-go to check dtype.type. We need either the equal check (==) or something equivalent added or the standard revised I think.

leofang avatar Apr 03 '21 03:04 leofang

(And also it doesn't seem that subclassing is an option, depending on how one interprets the standard.)

leofang avatar Apr 03 '21 03:04 leofang

Reopening this issue as relevant to the proposal in #425.

kgryte avatar May 05 '22 18:05 kgryte

All the discussion is happening in gh-425 and this is basically a duplicate issue, so I'll close this again. And we'll try to finalize gh-425 very soon.

rgommers avatar Sep 20 '22 18:09 rgommers