array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Adding support for `len`/`__len__`

Open jakirkham opened this issue 3 years ago • 8 comments

Several array implementation currently implement len/__len__ basically as shorthand for .shape[0]. It would be handy to have this to simplify adoption in existing codes as well as fit more with how current array implementations work

jakirkham avatar Sep 23 '22 07:09 jakirkham

It's hard to search for because "len" isn't a very unique string, but we discussed this multiple times for both arrays and dataframes and there was wide agreement that len(x) is a bad idea. I honestly still need to look up myself or test quickly in IPython along what axis it iterates.

@jakirkham I propose we close this.

rgommers avatar Sep 23 '22 08:09 rgommers

Interesting maybe I'm missing context here

Thought the main issue was dimensions of unknown length. This was noted when it was originally included and then removed ( https://github.com/data-apis/array-api/pull/289 )

Anyways am wondering if we should revisit given the number of lines of code using len needed to change in scikit-learn to accommodate the Array API. So more generally am concerned about the churn we may be causing by leaving things like this out

jakirkham avatar Sep 23 '22 08:09 jakirkham

len() is something that I am slightly hedgy about specifying, but if no projects involved ever played with the idea of behaving differently, I suppose it can be done.

The reason is that there is an alternative way to look at arrays; which is also the one that SymPy uses (for its matrix):

  • len(arr) should be the same as arr.size
  • For loops should iterate all elements (rather than the first dimension/axis).

By not specifying len() you remain compatible with that choice.

Starting from scratch, I would personally consider that choice. Because I like to think of arrays more as a "collection of values/elements with a homogenous structure". The "collection of values/elements" comes first and the "structure" only later in my philosophy :).

That does break the list-of-lists analogy which may be confusing to users. But to some degree that is a feature not a bug.

seberg avatar Sep 23 '22 08:09 seberg

Starting from scratch, I would personally consider that choice.

Yes, I think that is more reasonable indeed. len is only unambiguous for 1-D arrays. "list of lists" is a concept that I don't think many users have in mind, and it's pretty much 2-D specific as well (you can think about a 3-D array as "list of lists of lists" in principle, but who would actually do that?).

Anyways am wondering if we should revisit given the number of lines of code using len needed to change in scikit-learn to accommodate the Array API.

Honestly, I think this is (a) not a hard thing to change all at once for a code base, and (b) a good improvement for readability even in the absence of array API usage.

rgommers avatar Sep 23 '22 08:09 rgommers

One semi-related thing I would actually like to add to NumPy - but never worked on - is arr.iteraxis(axis=None, *, order="K"), with the semantics of axis=None iterating all elements (similar/identical to arr.flat, returning scalars in NumPy).

And otherwise axis selecting all axes to be iterate.

seberg avatar Sep 23 '22 09:09 seberg

I agree that arrays don't need to be directly iterable & with a length, but we do need a way to unpack arrays along an axis into a list.

If we don't have any way to do that currently in the standard, then we need an unstack function.

shoyer avatar Sep 23 '22 15:09 shoyer

I agree that arrays don't need to be directly iterable & with a length, but we do need a way to unpack arrays along an axis into a list.

Can you clarify why this is needed? Typically if you want to work along an axis of an array you just work along that axis.

This can be done with a list comprehension: [a[:, :, i, ...] for i in range(a.shape[k])] (with k colons).

asmeurer avatar Sep 23 '22 16:09 asmeurer

I agree that arrays don't need to be directly iterable & with a length, but we do need a way to unpack arrays along an axis into a list.

Can you clarify why this is needed? Typically if you want to work along an axis of an array you just work along that axis.

This can be done with a list comprehension: [a[:, :, i, ...] for i in range(a.shape[k])] (with k :s).

OK, fair enough, indexing in a loop would definitely do the trick here. Still maybe worth considering adding unstack() as a convenience function. (I'll open a separate issue to discuss this.)

shoyer avatar Sep 23 '22 16:09 shoyer

Can we close this issue with outcome that the proposal is rejected? No requirement to add __len__ (but of course implementers are free to add it anyway, outside of their standard support).

rgommers avatar Oct 07 '22 10:10 rgommers

No more comments, I'll close this. Thanks all!

rgommers avatar Nov 16 '22 22:11 rgommers