typeshed icon indicating copy to clipboard operation
typeshed copied to clipboard

stub packages with overlapping namespaces overwrite their METADATA.toml

Open mr-c opened this issue 2 years ago • 9 comments

Hello,

While working on the python3-typeshed package for Debian I noticed that if one installs two types- packages involving the same namespace, the METADATA.toml files of the first package will be overwritten by the second package:

$ python3 -m venv typeshed-venv
$ . typeshed-venv/bin/activate
$ pip install types_google_cloud_ndb
Collecting types_google_cloud_ndb
  Obtaining dependency information for types_google_cloud_ndb from https://files.pythonhosted.org/packages/09/d8/70b6b36b0e82095a43b2ff7cfe0a55f12fa53bdc70e4d2768538aae646fa/types_google_cloud_ndb-2.2.0.1-py3-none-any.whl.metadata
  Downloading types_google_cloud_ndb-2.2.0.1-py3-none-any.whl.metadata (1.6 kB)
Downloading types_google_cloud_ndb-2.2.0.1-py3-none-any.whl (16 kB)
Installing collected packages: types_google_cloud_ndb
Successfully installed types_google_cloud_ndb-2.2.0.1
$ cat typeshed-venv/lib/python3.11/site-packages/google-stubs/METADATA.toml 
version = "2.2.*"
upstream_repository = "https://github.com/googleapis/python-ndb"
partial_stub = true

[tool.stubtest]
stubtest_requirements = ["protobuf==3.20.2", "six"]
ignore_missing_stub = true
$ pip install types_protobuf
Collecting types_protobuf
  Obtaining dependency information for types_protobuf from https://files.pythonhosted.org/packages/72/03/f7dd2f1ec9712c4242f04b7cb0f7e88605a98ee2695f0e98d72a277580aa/types_protobuf-4.24.0.4-py3-none-any.whl.metadata
  Downloading types_protobuf-4.24.0.4-py3-none-any.whl.metadata (1.9 kB)
Downloading types_protobuf-4.24.0.4-py3-none-any.whl (62 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.1/62.1 kB 1.2 MB/s eta 0:00:00
Installing collected packages: types_protobuf
Successfully installed types_protobuf-4.24.0.4
$ cat typeshed-venv/lib/python3.11/site-packages/google-stubs/METADATA.toml 
version = "4.24.*"
upstream_repository = "https://github.com/protocolbuffers/protobuf"
extra_description = "Generated using [mypy-protobuf==3.5.0](https://github.com/nipunn1313/mypy-protobuf/tree/v3.5.0) on protobuf==4.21.8"
partial_stub = true

[tool.stubtest]
ignore_missing_stub = true

From a packaging system perspective, this makes me uncomfortable.

  1. Does METADATA.toml need to be installed in the top-level namespace of stub packages?
  2. Would it be a problem to exclude it from the Debian packages, instead of randomly choosing which one is kept?
  3. If keeping these files is important, perhaps they could be prefixed, so they can exist alongside each other: METADATA.google_cloud_ndb.toml / METADATA.protobuf.toml perhaps

mr-c avatar Nov 20 '23 10:11 mr-c

If I remember correctly, the METADATA files are mainly included for reference or potential use by tooling and have no runtime impact apart from that.

srittau avatar Nov 20 '23 10:11 srittau

I see a few options, but I'd be interested in what others think.

  1. As suggested by @mr-c, rename the metadata files to include the package name. For consistency's sake, we should do so for all type packages, not only namespace packages.
  2. For namespace packages, copy the metadata file into all non-namespace packages below the namespace package. (E.g. into google-stubs/cloud/ndb/METADATA.toml instead of google-stubs/METADATA.toml.)
  3. Skip copying the metadata file for namespace packages.

srittau avatar Nov 21 '23 14:11 srittau

If I remember correctly, the METADATA files are mainly included for reference or potential use by tooling and have no runtime impact apart from that.

I believe this is correct: no external tool that I know of uses the METADATA.toml file, so it shouldn't be the worst thing in the world if you excluded them from the Debian packages @mr-c. The idea is that users should be able to inspect them if they want to, but I'm not sure if anybody actually does. (typeshed-stats looks at the METADATA.toml files, but it grabs them directly from GitHub rather than downloading the built packages from PyPI.)

But, I agree that we should also fix this in typeshed so that it isn't an issue in the first place. I like @srittau's option (1); it seems simplest to me.

AlexWaygood avatar Nov 21 '23 14:11 AlexWaygood

I feel like a more principled solution might be to include the data from the METADATA.toml in the dist-info directory somehow, instead of in the stubs directory. After all, the METADATA.toml conceptually applies to the whole distribution, not to an individual directory.

JelleZijlstra avatar Nov 21 '23 15:11 JelleZijlstra

I looked into that, but accroding to the packaging docs, this may not be allowed:

This .dist-info directory may contain the following files, described in detail below:

I understand this to mean that the list is exhaustive, although we could ask the PyPA.

srittau avatar Nov 21 '23 16:11 srittau

~In practice, pip doesn't like if you add extra files to dist info~ actually what I'm remembering may only be true of things not in RECORD

hauntsaninja avatar Nov 21 '23 22:11 hauntsaninja

I understand this to mean that the list is exhaustive, although we could ask the PyPA.

https://discuss.python.org/t/extra-files-in-dist-info/39418

srittau avatar Nov 23 '23 13:11 srittau

It seems that - while not officially supported - adding extra files to .dist-info is not officially supported, it is tolerated. I don't think we should name the file METADATA.toml, though, considering possible confusion with the existing METADATA file. Maybe name it TYPESHED.toml or _TYPESHED.toml?

This leaves the technical aspect of adding the file to the .tar.gz and .whl files. This could prove tricky, as it seems that .whl files include a checksum. But I haven't looked into this.

srittau avatar Nov 23 '23 16:11 srittau

wheel library makes it easy to manage checksums, see e.g. here: https://github.com/hauntsaninja/change_wheel_version

hauntsaninja avatar Nov 24 '23 00:11 hauntsaninja