typing icon indicating copy to clipboard operation
typing copied to clipboard

Spec for subtypes of type variable constraints too vague

Open hurryabit opened this issue 6 months ago • 4 comments

The spec is very vague when it comes to instantiating a type variable with constraints with a subtype of one of the constraints:

https://github.com/python/typing/blob/b806c04133760efee732ce5081b177d84def22fe/docs/spec/generics.rst?plain=1#L71-L81

Let's look at a slightly more involved example:

class C[T: (int, str)]:
    def __init__(self, t: T) -> None:
        self.t: T = t

c = C(True)

The example in the spec implies that the constructor call is fine because True is of type bool, which is a subtype of int, and T can be instantiated to int. Thus, c should have type C[int]. This makes perfect sense and is totally natural if you write a bidirectional type checker anyway. Nothing controversial here.

What about the following?

c: C[bool] = ...

Does the type "widening" to the base type mentioned in the spec apply here too and this has to be interpreted as if the user had written C[int]? At least pyright interprets it this way. I'm wondering if this was the intent of the spec? Off the top of my head, I cannot think of a situation where this would be a useful feature (but I'm happy to convinced otherwise). Moreover, this can become rather counter-intuitive when C[bool] appears in contravariant positions.

I would like to get a conversation going that crystallizes the intent of the spec in this regard. If we come to a conclusion, I'd be more than happy to update the docs to reflect this outcome.


Here's how I would roughly specify the feature:

  1. Explicit instantiations of constrained type variables via class/alias specializations must be taken at face value and not automatically be widened to base types. If the explicitly mentioned type argument is neither a type variable nor a type equivalent to one of the constraints, that's a type error. For the exact meaning of "equivalent", I would suggest to pick an equivalence relation on the more syntactic side of the spectrum of possibilities but haven't thought about this aspect too deeply so far.

  2. Explicitly instantiating a constrained type variable T with another type variable S is only allowed if S also has constraints and each of them is also listed in T's constraints (up to the same equivalence as above). (I avoided the word "subset" on purpose since that's too easy to interpret as "subtype", which I do not mean!)

  3. All of this this should also apply to generic functions should there ever be a way to explicitly specialize them.

  4. When inferring type arguments for generic function calls, including constructor calls, the function is conceptually "exploded" into an overloaded function with one case for each constraint and overload resolution decides which constraint to pick as the instantiation for the type variable.

    This gets particularly gnarly in the presence of multiple matching constraints caused by multiple inheritance. Being consistent with overload resolution seems like a good property to have here.

    This covers the covariant case illustrated in the example above where bool gets "widened" to int. It also covers the contravariant case, where the function

    def higher_order[T: (int, str)](f: Callback[[T], None]): ...
    

    can be called with an argument of type Callback[[float], None] and T gets instantiated to int. (In some sense, float gets "narrowed" to int here.)

hurryabit avatar Jun 28 '25 10:06 hurryabit

The behavior of TypeVars with constraints isn't well described and I'd welcome an attempt to clarify them.

I'm surprised to see pyright's behavior here, though. My expectation would be that C[T] is legal only if T is equivalent to one of the constraints, to Any, or to another TypeVar with the same constraints. Therefore, C[bool] should be disallowed.

Here's a complete program demonstrating what you wrote:

class C[T: (int, str)]:
    def __init__(self, t: T) -> None:
        self.t: T = t

c: C[bool] = C(True)
reveal_type(c)

Pyright allows this without errors and reveals C[int]. Mypy says 'Value of type variable "T" of "C" cannot be "bool"' and then emits some more cascading errors. I also tried ty and pyrefly but it doesn't seem like they properly support constraints yet.

JelleZijlstra avatar Jun 28 '25 14:06 JelleZijlstra

I agree with most of the proposal, except for

  1. I would suggest to pick an equivalence relation on the more syntactic side of the spectrum of possibilities

The spec already defines type equivalence in purely semantic-subtyping terms. It is of course challenging to implement this completely, and in practice it's fine if type checkers have an incomplete implementation that covers the common cases well, but I would be opposed to explicitly defining a more limited syntactic type equivalence relation in the spec, if this would be read as preventing type checkers from understanding more complex semantic equivalencies. I agree that when specializing C explicitly, the subscript type should be equivalent to one of the constrained types -- but I don't think that "equivalent" should be defined any more strictly than what is already in the core concepts section of the spec.

carljm avatar Jun 28 '25 15:06 carljm

I also welcome more clarity in this part of the spec.

We might want to take this opportunity to address a terminology gap. I've heard others use the term "constrained type variable" or "restricted type variable", but the typing spec doesn't provide a name for this concept. The lack of terminology frequently gets in the way when discussing this concept. I've proposed that we adopt the term "value-constrained type variables". That's the term I use in the pyright documentation. This term recognizes the fact that other categories of constraints exist — including upper / lower bounds and variance. A constraint solver needs to honor all of these constraint categories.

I'm surprised to see pyright's behavior here, though...

The reason pyright does this is for consistency, or at least that was my thinking when I implemented this functionality. When a value-constrained type variable is implicitly specialized (e.g. when calling a constructor), the constraint solver must widen the type to match one of the value constraints.

class X[T: (int, str)]:
    def __init__(self, t: type[T]) -> None: ...

reveal_type(X(bool))  # X[int]

If this widening behavior is supported and well defined for implicit specialization, then I figured it should be supported for explicit specialization as well. I suppose one could make the argument that consistency between these two cases isn't important and there's value in limiting users in the explicit case, but this limitation does seem somewhat arbitrary and unnecessary. Is there a technical or theoretical reason why such a limitation is needed here? Or perhaps there's a usability argument that favors such a limitation?

I don't have a strong opinion here. If there is consensus that there's value in limiting explicit specialization in this manner, I'm fine with that.

If the spec were to enforce this equivalency requirement for explicit specialization, it would presumably need to carve out an exemption for Any?

class X[T: (int, str)]: ...

x: X[Any] # OK even though Any is not equivalent to int or str

Would the spec also carve out exemptions for other types that include Any, or is a "naked" Any special here? Is there a principled justification for exempting only a naked Any? Here again, this seems somewhat arbitrary to me. For this reason, pyright allows the following.

class Y[T: (list[int], list[str])]: ...

y: Y[list[Any]] # Allowed?

When inferring type arguments for generic function calls, including constructor calls, the function is conceptually "exploded" into an overloaded function with one case for each constraint

We have to be careful here. For value constraints, I presume that order does not matter — that is, the behavior should not depend on the order in which the constraints are listed in the type variable definition? That's not true for overloads, where order does matter and affects the behavior. Because of this disconnect, it's not clear to me how this behavior could be specified in a way that is completely deterministic and consistent across type checkers. Even if we were to say that order does matter for value constraints within a single type variable, the problem still exists because there may be multiple value-constrained type variables involved.

When there are multiple value-constrained type variables, "one case for each constraint" isn't sufficient. This would need to be generalized to "one case for every combination of constraints for all type variables". If there are three type variables that appear in the signature and each has three value constraints, the signature would "explode" to 27 overload signatures.

This gets particularly gnarly in the presence of multiple matching constraints caused by multiple inheritance.

Yes, but the problem extends beyond multiple inheritance. Any type overlap between two value constraints is problematic. Consider these examples, which all contain overlapping types: (int, str, int | str), (int, float), (str, Any). I've seen all of these variants in real-world code. Any overlap of value constraints creates ambiguous situations for constraint solvers.

One option is to disallow a value constraint that fully overlaps with another (i.e. the set of values it describes is a proper subset of the values described by the other). If a user supplies two fully-overlapping types in a value constraint list (such as str and str | int), it is arguably a sign of a programming error. Unfortunately, disallowing full overlaps doesn't eliminate all ambiguity because partial overlaps are still possible. Multiple inheritance allows any two types to overlap (unless they're both final).

I posed a question above about whether value constraints should be order independent. It turns out that they are not today, at least as they're implemented in mypy, pyright and pyrefly.

class A: ...
class B: ...
class C(A, B): ...

def func1[T: (A, B)](x: T) -> T: ...
def func2[T: (B, A)](x: T) -> T: ...

reveal_type(func1(C()))  # A
reveal_type(func2(C()))  # B

I'm not sure what the desired behavior is in this example. Perhaps the theoretically-correct thing to do here is for a constraint solver to allow for multiple solutions? Should the revealed type be A | B in both cases? It's not clear to me.

I'm on the record as saying that I'm not a fan of value-constrained type variables. To my knowledge, Python is the only mainstream language whose type system includes this concept. There are good reasons why it doesn't show up anywhere else. It does not compose well with other type system features, it creates significant challenges for type checker implementations — especially those that use fine-grained lazy evaluation, and there's almost always a different way to express the same concept using overloads and other facilities in the type system. While it's probably not practical to do so, part of me wishes we could deprecate value-constrained type variables and eliminate them from the type system over time.

erictraut avatar Jun 28 '25 19:06 erictraut

Regarding terminology, the runtime TypeVar object uses __bound__ for the upper bound and __constraints__ for what you call value constraints. I'd prefer to stick with that terminology, though I acknowledge that "constraint" can have other meanings.

If this widening behavior is supported and well defined for implicit specialization, then I figured it should be supported for explicit specialization as well. I suppose one could make the argument that consistency between these two cases isn't important and there's value in limiting users in the explicit case, but this limitation does seem somewhat arbitrary and unnecessary. Is there a technical or theoretical reason why such a limitation is needed here? Or perhaps there's a usability argument that favors such a limitation?

I would expect explicit specialization to specialize to exactly the type the user specified, unlike implicit specialization where the type checker figures out the right type to use.

Also, if there are multiple constraints that the given type is assignable to, pyright implicitly picks the first one:

class C[T: (int | str, bytes | str)]:
    pass

reveal_type(C[str]())  # C[int | str]

I know there are other cases where the behavior of value constraints depend on order, but it seems better to minimize such behavior.

Regarding Any: we need to at least specify what happens if you use a class generic over a constrained TypeVar without specializing it (e.g. in my example above, if you annotate an argument as arg: C). Generally, unspecified TypeVars default to Any, so I think that's what should happen here too.

I don't see a good reason to allow specialization with list[Any] in your example. It feels like extra complexity without a compelling use case.

I agree that constrained TypeVars are a rather problematic part of the type system. Unfortunately, they are somewhat widely used and I don't think we can just remove them. Hopefully we'll make the situation a little better if we identify discrepancies among type checkers and close them by tightening the spec.

JelleZijlstra avatar Jun 29 '25 01:06 JelleZijlstra