array-api icon indicating copy to clipboard operation
array-api copied to clipboard

feat: add `isin` to the specification

Open kgryte opened this issue 8 months ago • 0 comments

This PR:

  • resolves https://github.com/data-apis/array-api/issues/854 by adding isin to the specification.

  • of the keyword arguments determined according to array comparison data, this PR chooses to support only the invert kwarg. The assume_unique kwarg was not included for the following reasons:

    1. not all array libraries support this kwarg (e.g., ndonnx and CuPy). CuPy lists the kwarg in its documentation but states that this kwarg is ignored.
    2. when doing a quick search through sklearn, I was only able to find one usage of assume_unique when using isin and that was when searching lists of already known unique values.
    3. assume_unique is something of a performance optimization/implementation detail which we have generally attempted to avoid when standardizing APIs.
  • does not place restrictions on the shape of x2. While some libraries may choose to flatten a multi-dimensional x2, that is something of an implementation detail and not strictly necessary. For example, an implementation could defer to an "includes" kernel which performs nested loop iteration without needing to perform explicit reshapes/copies.

  • adds support for scalar arguments for either x1 or x2. This follows recent general practice in standardized APIs, with the restriction that at least one of x1 or x2 must be an array.

  • specifies that value equality should be used, but not must be used. This follows other set APIs (e.g., unique*). As a consequence of value equality, NaN values can never test as True and there is no distinction between signed zeros.

  • allows both x1 and x2 to be of any data type. However, if x1 and x2 have no promotable data type, behavior is left unspecified and thus implementation-defined.

Questions

Update: answers provided based on feedback below and discussions during workgroup meetings.

  • Would we be okay with requiring that value equality must be used? Is there a scenario where we want to allow libraries some wiggle room, such as with NaN and signed zero comparison?
    • answer: use must, not should, due to predominant usage patterns.
  • Are we okay with leaving out assume_unique?
    • answer: yes, this can be left out.
  • Are we okay with not mandating reshape behavior if x2 is multi-dimensional?
    • answer: yes, no reshape behavior is required.

kgryte avatar Jun 12 '25 10:06 kgryte