array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Consider adding the one-hot array creation function

Open NeilGirdhar opened this issue 1 year ago • 1 comments

One-hot is a very common array creation function in machine learning. It might be worth considering its addition.

Various implementations have different semantics:

  • pytorch (Optionally infers the number of classes, cannot select axis)
  • tensorflow (can choose on and off values)
  • jax

From an organization standpoint, I think it probably would belong alongside other creation functions like eye.

Alternatively, one-hot could be generalized to a broadcastable unit-impulse, which is already in scipy. Whereas one-hot chooses one element in a vector to be on, unit-impulse chooses one element in an array to be on.

One-hot is a generalization of the standard (elementary) basis vector that is sometimes requested—a generalization because it supports broadcasting.

NeilGirdhar avatar Apr 12 '24 21:04 NeilGirdhar

Thanks, @NeilGirdhar, for opening this issue. This might be another candidate for a "deep learning" extension (ref: https://github.com/data-apis/array-api/issues/158). Both PyTorch and Jax, which have similar APIs, place it in their nn namespace. While one-hot creation is certainly common, we'd probably need to see a bit more ecosystem uptake and alignment (e.g., CuPy, Dask, NumPy) before consideration.

kgryte avatar Apr 18 '24 05:04 kgryte