Stubs that require a version of python higher than the minimum supported by typeshed
In #10721, the seaborn stubs depend on matplotlib that only included a py.typed marker file in version 3.8.
matplotlib version 3.8 requires python >=3.9 which leads to the mypy tests failing on python 3.8.
Alex Waygood suggests adding a requires-python field to METADATA.toml and teaching stub-uploader to include it in the generated setup.py https://github.com/python/typeshed/pull/10721#issuecomment-1722454399.
Please let us know what you think about this idea or if you have another solution.
I think this would be a good idea.
We actually already have a need for this: typeshed claims to support Python 3.7+, but the latest version of types-jsonschema will only work with mypy on Python 3.8+. The only reason why this isn't causing our CI to fail is that we no longer run mypy in CI on Python 3.7, because the latest version of mypy only supports Python 3.8+. But that isn't particularly principled; we should signal to our users that the stubs package we're shipping only works with Python 3.8+, by including that information in the generated setup.py file.
Oh, actually, the types-jsonschema PR I was thinking of hasn't been merged yet... the package will require Python 3.8+ if #10583 is merged :-)
Should the field be empty by default or should it be set to typeshed's minimum supported python?
The METADATA.toml field should be optional — the default assumption should be that most stubs packages can be used with all versions of Python that typeshed supports.
We should add some validation to tests/parse_metadata.py to ensure that we don't have any requires-python fields that specify a minimum Python version less than the oldest version of Python supported by typeshed.
I'm not sure what the stub uploader should do if it encounters a stubs package that doesn't have a requires-python field in METADATA.toml. Should it add requires-python to the generated setup.py file anyway, and set it to the lowest version of Python supported by typeshed? Or should it just not add anything to the setup.py file?
I'm not sure what the stub uploader should do if it encounters a stubs package that doesn't have a
requires-pythonfield inMETADATA.toml. Should it addrequires-pythonto the generatedsetup.pyfile anyway, and set it to the lowest version of Python supported by typeshed? Or should it just not add anything to thesetup.pyfile?
Answering my own question after thinking about this a little more: I think it should add requires-python to the generated setup.py file anyway, and set it to the lowest version of Python supported by typeshed. Here's why: in January 2024, we'll be dropping support for Python 3.7 (#10113), and when we do so, we'll likely start using PEP-570 syntax across most of typeshed. That will make most of our stubs packages incompatible with Python 3.7, even the ones that don't have any external dependencies. It will be important to signal this to our users.
That will make most of our stubs packages incompatible with Python 3.7, even the ones that don't have any external dependencies. It will be important to signal this to our users.
Makes sense.
Required tasks:
- [x] Allow
requires_pythonfield inMETADATA.toml#10724 - [x] Add
oldest_supported_pythonfield, the minimal version of Python supported by typeshed, topyproject.toml#10724 - [x] Update
mypy_testto skip unsupported versions #10724 - [x] Update
regr_testto skip unsupported versions #10724 - [x] Update
stubtest_third_partyto skip unsupported versions #10724 - [x] Document the new
requires_pythonfield in https://github.com/python/typeshed/blob/main/CONTRIBUTING.md#the-metadatatoml-file - [ ] Update
pyright_testto skip unsupported versions - [ ] Update
pytype_testto skip unsupported versions - [x] Update the
runtestsscript (#14051) - [x] Update
stub_uploaderto include the field in the PyPI distributions typeshed-internal/stub_uploader#103
Since this issue touches that area.
Example scenario that we'd like to avoid in runtests and that using per-stubs python version would help with instead of always running on 3.8:
https://github.com/python/typeshed/actions/runs/6347797998/job/17243343457?pr=10801
I imported Final from the wrong location, ran the scripts locally which told me it was all good on Python 3.8+. But it failed on the CI on Python 3.7.
I just took a quick look on the pyright and pytype scripts. The problem here is that these tests run on the whole project. They have to become modular somehow before applying the same logic of skipping tests in mypy and stubtest. This is a bigger change than what we did with mypy.
As for the specific issue you got, maybe there is a ruff rule that replaces or bans certain imports?
I just took a quick look on the pyright and pytype scripts. The problem here is that these tests run on the whole project. They have to become modular somehow before applying the same logic of skipping tests in mypy and stubtest. This is a bigger change than what we did with mypy.
Yes, to explain the context here: as you noted, we run pytype and pyright pretty differently in CI to how to we run mypy (whether that's mypy_test.py, stubtest, or regr_test.py).
With our mypy tests, we do very elaborate things to try to ensure that each stubs package is only tested in isolation. That means that if stubs package A states that it requires numpy but stubs package B doesn't, they'll be tested with mypy using different virtual environments so that it's detected in CI if (for example) we forget to declare that stubs package A depends on numpy. This is the "principled" way of doing things, but leads to some fairly complex logic in the test scripts, and also makes the scripts somewhat slow. (That can be mitigated using threading/multiprocessing, but that only makes the scripts more complicated 😆.)
When we first added the infrastructure to typeshed allowing non-types dependencies, it was decided that we'd only do this kind of thing for mypy's test scripts. For the other type checkers we run in CI, we'd take a less principled, but simpler, approach: we'd just install all our non-types dependencies together into a big soup, and then test all our stubs packages together. Since we already do the principled thing with mypy, it was thought that it probably wouldn't catch any more bugs if we also did the principled/complex thing with the other two type checkers we run in CI, so it probably wasn't worth the maintenance cost to attempt to do so.
I imported
Finalfrom the wrong location, ran the scripts locally which told me it was all good on Python 3.8+. But it failed on the CI on Python 3.7.
Yeah, that's basically just because we've got a slightly unusual situation going on right now where we've partially-but-not-completely dropped support for Python 3.7. Some, but not all of our tests, run with --python-version 3.7 in CI now, and none of them will come January.
If you want a fix @Avasam, I'd merge a PR that did this, since we can still run our pyright tests with --pythonversion 3.7 in CI -- it's just mypy that we can no longer run with Python 3.7:
diff --git a/scripts/runtests.py b/scripts/runtests.py
index d7aebb9e5..b8ff184b0 100644
--- a/scripts/runtests.py
+++ b/scripts/runtests.py
@@ -96,7 +96,7 @@ def main() -> None:
strict_params = _get_strict_params(path)
print(f"\nRunning Pyright ({'stricter' if strict_params else 'base' } configs) for Python {_PYTHON_VERSION}...")
pyright_result = subprocess.run(
- [sys.executable, "tests/pyright_test.py", path, "--pythonversion", _PYTHON_VERSION] + strict_params,
+ [sys.executable, "tests/pyright_test.py", path, "--pythonversion", "3.7"] + strict_params,
stderr=subprocess.PIPE,
text=True,
)
But also, January isn't too far away, so we could probably just wait until then, when the problem will fix itself as a result of us dropping support for Python 3.7 :)
With our mypy tests, we do very elaborate things to try to ensure that each stubs package is only tested in isolation. That means that if stubs package
Astates that it requires numpy but stubs packageBdoesn't, they'll be tested with mypy using different virtual environments so that it's detected in CI if (for example) we forget to declare that stubs package A depends onnumpy. This is the "principled" way of doing things, but leads to some fairly complex logic in the test scripts, and also makes the scripts somewhat slow. (That can be mitigated using threading/multiprocessing, but that only makes the scripts more complicated 😆.)
Thanks for the context.
I think moving forward the whole testing process needs to be simplified, otherwise every addition will require a lot of tricky stuff and duplication of code. One way of dealing with this could be to factor out the logic that parses command line arguments to get which files need to be checked, the version checks + skipping, virtual environment creation and caching, and some other common tasks like results reporting, so that they can be used by all checks. We could maybe even go with factoring out the common bits to a state where adding a new check (new type checker!) could be made with a simple change to a subprocess call.
The virtual environment creation specifically could be made its own job in CI (with proper caching even between runs) that other jobs depend on. For local tests, this can also be done perhaps with an in-project build of virtual envs that are persistent on disk until a dependency changes or some time passes since creation. This means any execution of mypy et al, pyright, pytype would create the venv once and reuse it.
I don’t think I’ll have time to go through all of this in the near future but hopefully someone else does. I’ll help with this whenever I get a chance.
I think moving forward the whole testing process needs to be simplified, otherwise every addition will require a lot of tricky stuff and duplication of code. One way of dealing with this could be to factor out the logic that parses command line arguments to get which files need to be checked, the version checks + skipping, virtual environment creation and caching, and some other common tasks like results reporting, so that they can be used by all checks. We could maybe even go with factoring out the common bits to a state where adding a new check (new type checker!) could be made with a simple change to a subprocess call.
Yeah. I've previously tried to somewhat consolidate our logic between our test scripts in PRs such as #8700, #9382 and #9534. It's possible there's more still that we could do. However, it's tricky to centralise logic much more IMO, because the scripts all work pretty differently (for a number of reasons, among them: the different type checkers work differently; the different scripts have different aims; and the different scripts were created by different people at different points in time rather than all together at the same time). There's also the risk that we try to abstract away the differences between the scripts too much, which can make it harder to read the code and harder to refactor the scripts. Some of the functions that I've previously added to tests/utils.py already fall foul of this a little bit, I think: in hindsight, I'm not sure the print_success_msg() function actually improves anything. We could probably just inline that logic in the two places in our tests where the function is called, and the code would probably be more readable tbh.
Having said all that, though, you're very welcome to try to improve things further! :D
The virtual environment creation specifically could be made its own job in CI (with proper caching even between runs) that other jobs depend on. For local tests, this can also be done perhaps with an in-project build of virtual envs that are persistent on disk until a dependency changes or some time passes since creation. This means any execution of mypy et al, pyright, pytype would create the venv once and reuse it.
This is something I've been dreaming of for a while. You could maybe have a special directory for storing virtual envs for each stubs project (.mypy-test-venvs for mypy_test.py, .stubtest-venvs for stubtest_third_party.py, etc.). Every time mypy_test.py is invoked (for example), it could check whether each venv in .mypy-test-venvs is up to date, check whether any need to be added or deleted, and update the existing ones if necessary (maybe using pip-tools?). If they're all up to date, it won't need to do anything, it can just reuse them from the last invocation of the script. It could potentially be much faster than the current approach.
But, I haven't got around to actually doing anything like that yet :) And, of course... it will only make the scripts even more complicated. Sigh.
Note that when it comes to pyright, we don't even use a script in CI. We have tests/pyright_test.py, and we used to use that in CI, but nowadays it's purely for local testing. We currently use the pyright-action GitHub Action to invoke pyright in CI:
https://github.com/python/typeshed/blob/3eb9ff7f65acede795dc8be615e6f443d1f81578/.github/workflows/tests.yml#L132-L192
As a heads up: @Avasam implemented the necessary changes in runtests in #14051. We "only" need to update pyright_test and pytype_test. At least pyright_test is complicated since we need to generate the config on the fly. See #14052.