typing icon indicating copy to clipboard operation
typing copied to clipboard

initial draft of new concepts section

Open carljm opened this issue 1 year ago • 14 comments

This is an initial proposed draft of an expanded "type system concepts" document for the typing specification. It intends to improve clarity of the spec by giving definition to important terms that can be used throughout the remainder of the spec. See #1534

This PR largely owes its existence to, and includes a lot of text drawn directly from, https://bit.ly/python-subtyping, authored by @kmillikin. (Any errors introduced are mine.) It also draws on concepts described in PEP 483.

This PR doesn't yet audit the remainder of the spec to ensure consistent use of these terms throughout. I intend to do that audit and update, but I'd like to get some agreement on the terms themselves first.

carljm avatar May 22 '24 12:05 carljm

cc typing council members @JelleZijlstra, @hauntsaninja, @gvanrossum, @rchen152, and @erictraut -- I'd be grateful for your review and feedback here.

carljm avatar May 22 '24 12:05 carljm

Also cc @superbobry, since you've expressed interest in this area

carljm avatar May 22 '24 17:05 carljm

@carljm, thanks for taking on this task! This is an important addition to the typing spec. These are foundational concepts. It's important to standardize the terminology and provide clear definitions. Thanks also to @kmillikin for starting this work a few years ago and giving us something to build upon.

I've included a bunch of comments in my first-pass review. Many of them are minor suggestions for readability. As for the overall content, I think this is directionally on the right path. There are many parts that I really like including the definitions of subtyping, materialization, consistent subtyping, and the "assignable to" relationship.

The one part that I really dislike is the use of the terms "static type" and "dynamic type". I think these are really confusing in the context of Python. They make more sense in a gradual type system implemented with a compiler that emits dynamic checks when a type is not statically known, but that doesn't describe Python. With Python, dynamic type checks are performed at runtime regardless of whether static type annotations are present. All type annotations are "static" in the sense that they are present in the code, and static type checkers can (and do) reason about them. So, I find it very confusing and counterintuitive to call Any a "dynamic type" or to say that list[Any] is not a "static type". I feel strongly that we should explore other ways to talk about these concepts. My concerns can perhaps be addressed by simply choosing better terms than "static type" and "dynamic type". Or better yet, perhaps we can eliminate these concepts entirely from the spec and therefore avoid the need to name them.

One other part that I'm lukewarm on is the proposed definition of Any. I'm not a fan of defining it as "should not be subject to static checking". First, I find this definition to be imprecise. Second, our collective understanding of gradual type systems has progressed since PEP 484, and I think there's a good argument to be made for defining Any differently. Third, Any is not the only gradual typing affordance in the Python type system; I'd prefer to come up with a more general definition that covers all gradual typing affordances (including ... in callables and tuple[Any, ...]).

I think I saw some of these points already made in the comments, but I think we need more discussion before converging on a final draft. This PR is probably not the ideal forum for such discussions. Maybe we should move these open topics to Discourse to improve visibility and solicit input from other members of the typing community.

erictraut avatar May 23 '24 02:05 erictraut

Thanks @erictraut for the excellent review! Lots of great comments here; most are clear improvements, some I think need further discussion. I really appreciate the concern for precision and clarity; that's the purpose of this PR! I'll update the PR and reply further tomorrow. I'm happy to also take some of the core terminology questions to Discourse for broader discussion; I intended to post a note there anyway to direct interested people to this PR before we'd consider merging it.

carljm avatar May 23 '24 03:05 carljm

Applying the newly defined terms within the rest of the specification involves integrating these concepts into the existing documentation and examples to ensure clarity and consistency. Here are examples of how these terms might be applied, drawing inspiration from the Python typing documentation and the principles outlined in PEP 484 and PEP 483.

Applying Union Type

The Union type is used to indicate that a variable can be one of several types. For example, a function that accepts either a string or an integer as an argument would use the Union type to annotate its parameter:

from typing import Union

def process_input(input_data: Union[str, int]) -> None:
    # Function body
    pass

Using Subtyping

Subtyping allows for polymorphism, where a variable of a superclass can hold instances of any subclass. This is particularly useful in object-oriented programming. For instance, if you have a base class Animal and subclasses Dog and Cat, you can use subtyping to write a function that works with any animal:

from typing import TypeVar, Generic

T = TypeVar('T', bound='Animal')

class Animal(Generic[T]):
    pass

class Dog(Animal):
    pass

class Cat(Animal):
    pass

def pet(animal: Animal) -> None:
    # Function body
    pass

Defining Type Variables

Type variables are placeholders for types in generic classes or functions. They allow for the creation of flexible, reusable type definitions. For example, a generic list class might use a type variable to denote the type of elements it holds:

from typing import TypeVar, List

T = TypeVar('T')

class MyList(List[T]):
    pass

Specifying Constraints

Constraints can be used to further refine the types that a type variable can represent. For example, if you want to ensure that a type variable represents a numeric type, you can specify a constraint:

from typing import TypeVar, Union

T = TypeVar('T', bound=Union[int, float])

def process_number(num: T) -> None:
    # Function body
    pass

Special Types

Special types, such as None, bool, int, etc., have built-in methods or operators associated with them. These types are often used in type annotations to indicate the expected type of a variable or function return value:

def is_even(n: int) -> bool:
    return n % 2 == 0

Protocols

Protocols define a set of method signatures that a class must implement. They are used to achieve structural typing, where a class adheres to a contract defined by a protocol:

from typing import Protocol

class Printer(Protocol):
    def print(self, message: str) -> None:
        pass

def print_message(printer: Printer, message: str) -> None:
    printer.print(message)

By applying these concepts consistently throughout the specification, you can enhance the clarity and usability of the documentation, making it easier for developers to understand and utilize the typing system effectively.

ashishpatel26 avatar May 23 '24 12:05 ashishpatel26

I'm moving this back to Draft state, and removing the Typing Council decision label for now, since I think there are a couple issues to work out here before it is really ready for a typing council decision and merge.

I also think this PR should include more updates to the rest of the spec to clarify how these terms are useful in the spec, before it's merged. I plan to work on those additions to the PR.

carljm avatar May 23 '24 17:05 carljm

I'm happy to also take some of the core terminology questions to Discourse for broader discussion; I intended to post a note there anyway to direct interested people to this PR before we'd consider merging it.

This is actually part of the standard Typing Council procedure (https://github.com/python/typing-council?tab=readme-ov-file#decisions), so we will need a Discourse post before the Council can approve this change.

JelleZijlstra avatar May 23 '24 18:05 JelleZijlstra

First off, this is amazing to see specified, and thank you for the effort put into doing so.

I don't have much to add right now, there are a few points I need to give a bit more thought to first, but I'm going to echo @erictraut that I think we can do better than not checking values with a type of Any, especially with other surrounding context added in this update to the specification, I don't know how specific we should be at this point in time, but I would prefer if we didn't specifically say that no type errors should be raised for use of Any and phrase it in a way that promotes the point of gradual typing, but allows detecting situations that we could statically analyze as impossible to be consistent.


I think the word "dynamic" is useful for relating to people's expectations from other languages, especially others that have similar notions in their type systems (eg. Kotlin, Elixir), but I would appreciate that if we're going to use it, there's some sort of note about the choice of word for those readers specifically to avoid confusing anything where it doesn't translate. (something like a sphinx note admonishment where the term is introduced would be plenty, but doesn't need to be specifically that.)

mikeshardmind avatar May 23 '24 22:05 mikeshardmind

Thanks for taking a look, @mikeshardmind! Looking forward to your further thoughts. A couple comments:

...we can do better than not checking values with a type of Any... phrase it in a way that promotes the point of gradual typing, but allows detecting situations that we could statically analyze as impossible to be consistent.

I think the latest wording is more careful here. I do think it's important to say that Any itself always represents a totally unknown static type, i.e. it carries no static type information. So we should not see static type errors related to Any itself, unless it would be an error for every possible Python object.

There's a related issue that I think you might be referring to here, of "narrowing" the Any type. For instance, in code like this:

def g(i: int): ...

def f(x: Any):
    g(x)
    reveal_type(x)    

Is it valid for a static type checker to narrow the type of x because of the call to g?

I think it might be reasonable. But even if we allow this narrowing, that doesn't mean that Any itself can ever represent a more-bounded gradual type. Any is always a totally unknown static type. If a type checker narrows here, it should do so by narrowing the type of x to Any & int instead of simply Any. Any & int is the gradual type with an upper bound of int.

Just like narrowing object to int when we see if isinstance(x, int): narrows the type of x from object to int, but doesn't change our interpretation of object, narrowing Any to Any & int might be valid in some situations, but it doesn't change our interpretation of Any itself.

I think the word "dynamic" is useful for relating to people's expectations from other languages, especially others that have similar notions in their type systems (eg. Kotlin, Elixir), but I would appreciate that if we're going to use it, there's some sort of note about the choice of word

I agree that the term "dynamic" is widely used to describe the unknown type in gradual type systems, and it's useful to be consistent with this terminology. I'd be interested to know what specifically you think would be useful to clarify in the note that you propose.

carljm avatar May 24 '24 03:05 carljm

I'm moving this back to Draft state, and removing the Typing Council decision label for now, since I think there are a couple issues to work out here before it is really ready for a typing council decision and merge.

The label is just a marker that the PR will eventually need TC approval, not a request for approval. (The latter must be done in the TC repo.) As such I think the label should be re-applied to prevent accidental merges and for correct categorization.

srittau avatar May 24 '24 07:05 srittau

So we should not see static type errors related to Any itself, unless it would be an error for every possible Python object.

I think the wording in the section "the gradual guarantee" is significantly more accurate to the purpose of Any, than where Any is introduced in the Gradual types section, but I do see the revision to that section you added improves it from language I thought was inaccurate as part of efdbc5f2

Is it valid for a static type checker to narrow the type of x because of the call to g?

In other languages that use the word dynamic, this would be a valid place to narrow. (Note: this ties into another question's answer below)

I would say this should make the inferred type Any & int (specifically, this would have some still unknown gradual type that must be a subtype of int for the code to have remained consistent) Observational narrowing like this might detect cases where a total use is inconsistent, but I suspect that it would be significantly less likely to do so than in other languages which have dynamic, but which also have stronger rules about what can be implemented.

Similarly:

x: Any
# we can't narrow till after this succeeds
# and it failing doesn't tell use anything, it could be a callable that errors
x()  
# if this line is reachable, the above succeeding means that 
# x has an observed bound in this scope of: Any & Callable[[], Any]

But here's a trickier one:

x: Any
x.__name__ + 1  # error?

Should this error? __name__ has a defined purpose in the data model, The datamodel reserves all dunder methods and the places where __name__ is defined to exist, it must be a str, but it isn't defined to exist on all objects. I would say this is a place a static type checker should error even if it isn't strictly impossible because of the language specification.


I'd be interested to know what specifically you think would be useful to clarify in the note that you propose.

This will largely depend on how closely we keep to the behavior of other languages that use dynamic or not on how important the note is and what should be in it, so it might be best to revisit this note towards the end of discussing some of the rest of this.

If type checkers aren't specified to place bounds on gradual types from observation of use, this would need a pretty strong warning that this differs significantly.

mikeshardmind avatar May 24 '24 12:05 mikeshardmind

@carljm

Is it valid for a static type checker to narrow the type of x because of the call to g?

I think in Python that would have to be flow-sensitive which can get tricky to specify. I wouldn't try.

I don't know what we want to say about conformance to the spec, but we can't really prevent a static checker from implementing whatever checks they feel are useful. Maybe they want to make some of them warnings, not errors. Or make them able to be selectively disabled (so the checker could be run in "spec" mode) if the implementation complexity cost isn't too high.

@mikeshardmind

Should this error? __name__ has a defined purpose in the data model, The datamodel reserves all dunder methods and the places where __name__ is defined to exist, it must be a str, but it isn't defined to exist on all objects.

I would say yes but it should be a consequence of the semantics, not a special case for Any. (__name__ is the special case.) I think AGT would have (I could be wrong!): concretize Any to the set of all static types, collect the attribute types of all of them that have __name__ defined to get the singleton set {str}, and abstract that to the gradual type str.

kmillikin avatar May 24 '24 16:05 kmillikin

@kmillikin I agree that it's a consequence of the semantics if we say yes, but this does mean we need to be more careful with the wording for when Any can have errors associated with it.

The below may be off topic as a result for figuring out the wording around when Any may present an error or not, but including my thoughts there in case someone finds them useful to reasoning through this:

It might be useful to more strongly collect the things specified as part of the language. __name__ is one such thing, but it's not actually {str}, it's also allowed not to exist and should not be assumed to exist. This is functionally different from treating this as {str, Never} with current semantics, and slightly closer to how NotRequired works for TypedDicts as a structural type.

So in this case, the rationale would be that SomeAny.__name__ can't error under the gradual guarantee, as there exists at least one plausible type with that attribute, but if it exists it must be a str, so SomeAny.__name__ + 1 can be viewed as an error.

In some ways, the things specified in the data model might be possible to view as a virtual partially known structural type that all objects may or may not implement some of all of, but the parts they implement (either because they must, or because they choose to) should be considered an error if they don't conform to that virtual structural type.

mikeshardmind avatar May 24 '24 17:05 mikeshardmind

We definitely need to be careful of wording! But I don't see this as an error due to a use of an expression of type Any. It's an error due to an expression of type str. If there are places where we write something that suggests that use of an expression of type Any will "pollute" any expression it is a part of so that you will never get type errors, then we should not write that!

I think the AGT view of x.__name__ is that there is a partial function lookup(__name__) from static types to static types. The collecting interpretation is the set of attribute types for types where the partial function is defined.

We don't care where it's not defined because that will be an attribute error and we will never try to add 1.

kmillikin avatar May 24 '24 17:05 kmillikin

Thanks @erictraut for thorough and thoughtful review!

At this point I've resolved all inline comments, and I don't have further edits planned to the "concepts" doc (though I'll happily consider any further inline suggestions to improve or clarify any wording.)

My plan now is to (1) make the minimal necessary wording updates to the rest of the spec so that it isn't wrong with respect to these definitions (this seems important to do before merging this PR, though I want to avoid the temptation to make too many changes and metastasize this PR), and then (2) post this PR to Discourse to solicit broader feedback. (Open to doing those in the other order if someone prefers that, but my inclination is to solicit broader feedback once the full set of intended changes are reviewable.)

carljm avatar May 26 '24 20:05 carljm

I've audited the entire type spec for consistency with this terminology and made edits as appropriate.

There are some judgment calls in how to best approach this, and I welcome feedback. For instance, in parts of the spec that discuss subtyping heavily (e.g. the callable types spec) I chose to instead discuss assignability and consistency, since this naturally includes all gradual callable types. An alternative choice could be to keep these specs written in terms of subtyping for fully static types, and then note in a single sentence that assignability for these types can be determined via the usual combination of materialization and subtyping. I'm happy to switch to this approach if there is consensus that it would be clearer.

I also switched A and B types in some parts of the spec around so that the whole spec is consistent about B being assignable to A, not the other way around. This was probably unnecessary to do in this PR (oops) but it's done now, so unless anyone objects to it, I'll leave it.

carljm avatar Jun 02 '24 01:06 carljm

Posted for feedback at https://discuss.python.org/t/formalizing-core-definitions-in-the-typing-spec/54785

carljm avatar Jun 02 '24 05:06 carljm

Thanks once again for all the excellent reviews! I think I've addressed all comments. Some required non-trivial additions (e.g. regarding structural vs nominal subtyping/assignability, and equivalence of gradual types) which previous reviewers may want to take a look at.

I'll wait a few days for review of these changes, and then I'll submit this to the typing council for consideration.

carljm avatar Jun 14 '24 22:06 carljm

Congratulations and thanks @carljm for getting this change through! This puts the spec on a much firmer footing.

JelleZijlstra avatar Jun 24 '24 15:06 JelleZijlstra