modular icon indicating copy to clipboard operation
modular copied to clipboard

[stdlib] Add List.__contains__ for distinct StringSlices

Open IbarraSamuel opened this issue 8 months ago • 12 comments

Fixes https://github.com/modular/modular/issues/4579

By adding an overload for List.contains where the types are not the same, cannot be implicitly converted, but can be compared one to the other (StringSlices with distinct origins)

IbarraSamuel avatar May 14 '25 22:05 IbarraSamuel

This problem is also present for Span and therefore any other "view" types we might have. This could easily turn into a game of Whac-a-Mole if we can't find a better solution to this.

bgreni avatar May 15 '25 01:05 bgreni

This problem is also present for Span and therefore any other "view" types we might have. This could easily turn into a game of Whac-a-Mole if we can't find a better solution to this.

We could probably build a core trait for all view like types using parametric traits:

trait ViewType[mut: Bool, //, origin: Origin[mut]]:
    ...

struct List:
    fn __contains__[
        V1: ViewType & Copyable & Movable & Writable,
        V2: V1[.mut=_, .origin=_] & Copyable & Movable & Writable & Comparable[V1[.mut=_, .origin=_]],
    ](self: List[V1, *_], value: V2): ...

martinvuyk avatar May 15 '25 12:05 martinvuyk

It has the same root cause as #4386, so indeed very bad. Or basically, native Eq doesn't cut it.

soraros avatar May 15 '25 13:05 soraros

This problem is also present for Span and therefore any other "view" types we might have. This could easily turn into a game of Whac-a-Mole if we can't find a better solution to this.

Right, this is true for any parametric type. We often see this with types carrying an origin like the ones you mentioned. I think this will be solvable with parametric traits which is on the compiler roadmap for Q2 still.

That said, the open question is: do we want to do this now or wait for parametric traits? We all agree in the long term, it doesn't make sense to have specialization in List[T] methods for specific parametric Ts such as Span, StringSlice, etc. Right now, in the StringSlice case, you'd have to convert to a String in order to do the comparison or __contains__ call - is that right?

JoeLoser avatar May 15 '25 13:05 JoeLoser

Right, this is true for any parametric type. We often see this with types carrying an origin like the ones you mentioned. I think this will be solvable with parametric traits which is on the compiler roadmap for Q2 still.

It's not even parametric trait, but trait with partially bound/generic associated type.

That said, the open question is: do we want to do this now or wait for parametric traits? We all agree in the long term, it doesn't make sense to have specialization in List[T] methods for specific parametric Ts such as Span, StringSlice, etc. Right now, in the StringSlice case, you'd have to convert to a String in order to do the comparison or __contains__ call - is that right?

We’ll need to manually write these to make the standard library actually useful, I think. For instance, due to the mutability/immutability bug, UnsafePointer doesn’t show this issue yet, but once that's fixed, its __eq__ will become immediately useless.

It’s actually a bigger problem: the Eq-conforming method __eq__(Self, Self) -> Bool isn’t the one we’ll end up using, which creates the illusion of trait conformance.

soraros avatar May 15 '25 13:05 soraros

UnsafePointer doesn’t show this issue yet, but once that's fixed, its eq will become immediately useless

@soraros Can confirm firsthand that it's the case, but it's not that bad. It's just a matter of building an Int for comparisons

It’s actually a bigger problem: the Eq-conforming method eq(Self, Self) -> Bool isn’t the one we’ll end up using, which creates the illusion of trait conformance.

yep, but again that is solvable using parametric traits + your proposal https://github.com/modular/modular/issues/4346. That is why I put in my code above:

    fn __contains__[
        V1: ViewType & Copyable & Movable & Writable,
        V2: V1[.mut=_, .origin=_] & Copyable & Movable & Writable & Comparable[V1[.mut=_, .origin=_]],
    ](self: List[V1, *_], value: V2): ...

basically binding V2 to being the same type as V1 with a different mut and origin and that it has to be comparable to any unbound parametrization of V1

I think this will be solvable with parametric traits which is on the compiler roadmap for Q2 still. [...] That said, the open question is: do we want to do this now or wait for parametric traits?

@JoeLoser I personally would rather wait, otherwise we will be adding a bunch of patch-work everywhere (like the PR where I try to add all the iterator helper methods). Especially if we will be getting parametric traits this year, I would rather wait and solve this the right way at once

martinvuyk avatar May 15 '25 13:05 martinvuyk

+1 to waiting it out for parametric traits

bgreni avatar May 15 '25 14:05 bgreni

UnsafePointer doesn’t show this issue yet, but once that's fixed, its eq will become immediately useless

@soraros Can confirm firsthand that it's the case, but it's not that bad. It's just a matter of building an Int for comparisons

Whether this is valid depends on whether we're defining Int as ssize_t or intptr_t. Given that pointer tagging has now been encoded into the ARM ISA, we probably want to consider platforms where SIZE_MAX != UINTPTR_MAX. Also, it's technically more correct to use UInt for pointers, which is why I'm using the unsigned versions. Pointers should, ideally, own their own comparison logic.

It’s actually a bigger problem: the Eq-conforming method eq(Self, Self) -> Bool isn’t the one we’ll end up using, which creates the illusion of trait conformance.

yep, but again that is solvable using parametric traits + your proposal #4346. That is why I put in my code above:

I agree, __eq__(Self, Self) needs to go ASAP. It probably makes sense to sit down and review all of the stdlib traits once we have parametric traits, since a lot of them should be parametric. I'd even go so far as to say that having parametric moveinit and copyinit might make sense for cases where you can init a T from a U (although that can be covered with normal ctors).

    fn __contains__[
        V1: ViewType & Copyable & Movable & Writable,
        V2: V1[.mut=_, .origin=_] & Copyable & Movable & Writable & Comparable[V1[.mut=_, .origin=_]],
    ](self: List[V1, *_], value: V2): ...

basically binding V2 to being the same type as V1 with a different mut and origin and that it has to be comparable to any unbound parametrization of V1

I think this will be solvable with parametric traits which is on the compiler roadmap for Q2 still. [...] That said, the open question is: do we want to do this now or wait for parametric traits?

@JoeLoser I personally would rather wait, otherwise we will be adding a bunch of patch-work everywhere (like the PR where I try to add all the iterator helper methods). Especially if we will be getting parametric traits this year, I would rather wait and solve this the right way at once

I agree that we should wait. requires + parametric traits are two of the big asks from external stdlib contributors to the compiler team, and they will enable a lot of API work.

Now, on a ViewType abstraction, I think that also needs to wait for parametric traits, but that having one would be valuable. However, I think that it might make sense to have that trait be something on top of RandomAccessIterator.

owenhilyard avatar May 15 '25 14:05 owenhilyard

@martinvuyk, @soraros We might not need to wait for parametric Traits. If you quick check this:

from sys.intrinsics import _type_is_eq

fn main():
    first_string = String("foo")
    _first_slice = first_string.as_string_slice()

    second_string = String("bar")
    _second_slice = second_string.as_string_slice()

    types_are_equal = _type_is_eq[
        __type_of(_first_slice), __type_of(_second_slice)
    ]()

    print(types_are_equal)

You will see that using _type_is_eq, we can compile time know if it's the value type is the same as the type of the elements in the list, and then do the comparison with a rebind. Should not be a problem since it's just a comparison and it's already how tuple contains work, so this should do the work:

    fn __contains__[
        U: EqualityComparable & Copyable & Movable,
        E: EqualityComparable, //,
    ](self: List[U, *_], value: E) -> Bool:
        @parameter
        if _type_is_eq[U, E]():
            value_rebind = rebind[U](value)
            for i in self:
                if i[] == value_rebind:
                    return True
        return False

The big problem I can see here is that you don't have "type checked" information, people may be trying to compare if there is an Int within a list of Strings of something like this. So I don't know if it's a good idea.

With the requires clause, it should do the work perfectly, I guess the LSP will run the requires conditions and tell people to not use incorrect types.

Don't know if we should add Copyable & Movable trait bounds to E, maybe yes, if at the end both types (U and E) should be the same type, but is not really needed for this method (and sometimes, types can have conditional conformance based on some parameters, and a type that could be comparable doesn't require to be Copyable or Movable).

Another option is to eliminate all inneccesary trait bounds, but it requires to do the _type_is_eq within the loop. I don't think is a good idea.

    fn __contains__[
        E: EqualityComparable, //,
    ](self, value: E) -> Bool:
        for i in self:

            @parameter
            if _type_is_eq[T, E]():
                if value == rebind[E](i[]):
                    return True
        return False

IbarraSamuel avatar May 15 '25 22:05 IbarraSamuel

The big problem I can see here is that you don't have "type checked" information, people may be trying to compare if there is an Int within a list of strings of something like this.

Yeah that's a non-starter unfortunately.

bgreni avatar May 15 '25 22:05 bgreni

Default to heterogeneous comparison can be very dangerous, and it's even worse when the language has implicit conversion.

a = StringSlice("test")
print(a in ("test", "a", "b"))  # prints False

soraros avatar May 15 '25 22:05 soraros

Totally agree, this one can wait. A for loop is an easy workaround for now

IbarraSamuel avatar May 16 '25 05:05 IbarraSamuel

Thanks all for the discussion. We're all definitely eager to play with parametric traits. In light of this, I'd suggest we close this PR and wait for parametric traits to solve this properly. Client code can be verbose in the meantime, but that is a fine short-term trade-off.

JoeLoser avatar May 16 '25 21:05 JoeLoser