importlib.metadata.Distribution equality check not working
Bug report
During migrating our package from pkg_resources to using importlib.metadata I encountered some weirdness when it comes to using importlib.metadata.Distribution objects. Specifically, it seems to lack a custom __eq__ operator, simply inheriting object.__eq__. In consequence checking distributions for equality seems to just always fail..
from importlib.metadata import distribution
distribution('pip') == distribution('pip') # False
.. which in turn makes it impossible to use dist kwarg in entry point selection and users have to fall back to manually comparing distribution plain text name attribute (or something similar / more sophisticated):
from importlib.metadata import entry_points, EntryPoints
entry_points(group='console_scripts', dist='pip') # empty EntryPoints list
entry_points(group='console_scripts', dist=distribution('pip')) # empty EntryPoints list
EntryPoints(ep for ep in entry_points(group='console_scripts') if ep.dist == distribution('pip')) # empty EntryPoints list
EntryPoints(ep for ep in entry_points(group='console_scripts') if ep.dist.name == 'pip') # expected list of entry points
I think comparing distributions for equality should be fixed, maybe even allowing comparison with a plain string containing the distribution's name (e.g. "pip") might make sense (although I'm not aware of potential drawbacks / conflicts that might happen in case there might be multiple distributions with same "name" which intuitively seems odd to me).
I believe this is the right place for this report, I'm very sorry if it isn't.
Your environment
- CPython versions tested on: 3.11.4
- Operating system and architecture: Debian Linux 11, linux-64
conda env:
# packages in environment:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2023.7.22 hbcca054_0 conda-forge
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libexpat 2.5.0 hcb278e6_1 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 13.1.0 he5830b7_0 conda-forge
libgomp 13.1.0 he5830b7_0 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libsqlite 3.42.0 h2797004_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libzlib 1.2.13 hd590300_5 conda-forge
ncurses 6.4 hcb278e6_0 conda-forge
openssl 3.1.1 hd590300_1 conda-forge
pip 23.2.1 pyhd8ed1ab_0 conda-forge
python 3.11.4 hab00c5b_0_cpython conda-forge
readline 8.2 h8228510_1 conda-forge
setuptools 68.0.0 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
tzdata 2023c h71feb2d_0 conda-forge
wheel 0.41.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
miniconda base env:
shell level : 4
conda version : 23.3.1
python version : 3.9.15.final.0
virtual packages : __archspec=1=x86_64
__glibc=2.31=0
__linux=5.10.0=0
__unix=0=0
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
platform : linux-64
user-agent : conda/23.3.1 requests/2.29.0 CPython/3.9.15 Linux/5.10.0-21-amd64 debian/11 glibc/2.31 solver/libmamba conda-libmamba-solver/22.8.1 libmambapy/1.1.0
Linked PRs
- gh-109646
Thanks for the report. Happy to explore the issue here.
I'm not at all confident that equality by name is correct, because it's possible for multiple distributions of the same name to exist in the environment. For example:
~ @ pip-run 'importlib_metadata<5' -- -m pip-run 'importlib_metadata>5' -- -q
>>> import importlib.metadata as md
>>> ds = [dist for dist in md.distributions() if dist.name == 'importlib-metadata']
>>> ds
[<importlib.metadata.PathDistribution object at 0x10305aed0>, <importlib.metadata.PathDistribution object at 0x10306f5d0>]
>>> ds[0].version
'6.8.0'
>>> ds[1].version
'4.13.0'
This example uses pip-run to install two different versions of importlib_metadata (I could have used any PyPI package) onto two different places in sys.path and then filters on those two distribution objects.
In this case, I wouldn't expect these two distributions to compare equal, even though they have the same name. Moreover, if two distributions had the same name and version but were located in different places on sys.path, those also should probably not necessarily compare equal.
Moreover, metadata providers are expected to implement their own Distribution subclasses, so it's conceivable that two Distribution might be distinguished by some factor not yet known.
Another thing to consider - should a distribution compare equal to a string of the name? In one example, you imply that's expected:
entry_points(group='console_scripts', dist='pip') # empty EntryPoints list
That suggests that distribution('pip') == 'pip' should evaluate to True. That's possible, but I'm not sure it's desirable. What if someone wants instead to compare for a specific version (e.g. distribution('pip') == 'pip==23.1')? That gets messy and isn't even implementable without behavior only available outside the stdlib (packaging).
I agree that entry point selection is a compelling use case for such a comparison.
I wouldn't characterize the described behavior as a bug, but rather an unaddressed feature.
Thanks for looking at this @jaraco
because it's possible for multiple distributions of the same name to exist in the environment
Seems weird to me, at least I have never seen or heard of anything like that.
But then again, if the consensus is that filtering by distribution name is too ambiguous so that it might actually be harmful, I'd argue that still the current implementation should still change or at least this behavior for dist kwarg as a filter should be mentioned in docs, since currently filtering/matching by dist will always result in an empty result for reasons laid out above.
If there is too much concern for ambiguous behavior doing proper "equal" comparisons, then I would propose to either..
- raise an exception if user tries to use
distfor filtering entry points or.. - at least show a warning that using
distin filtering will always result in empty results
Currently dist is accepted for filtering, which results in users assuming it can be used and leading to time spent trying to figure out why it isn't working.
To be constructive, I set up a quick PR with what I would've wished to see in the docs at the time when I was transitioning our package from pgk_resources to this new recommended stdlib API.
Sorry for the delay in reviewing this. Can you share more about how pkg_resources previously supported your use-case?
What we did (still do actually) with pkg_resources (using pip just as an example here):
from pkg_resources import load_entry_point
load_entry_point('pip', 'console_scripts', 'pip')
What I thought would do the trick with importlib.metadata, doesn't work though and took quite some time to figure out why:
import importlib.metadata
# doesn't work, returns an empty list
importlib.metadata.entry_points(dist='pip', group='console_scripts', name='pip')[0].load()
Only way I could get this replicated with importlib.metadata:
import importlib.metadata
list(ep for ep in importlib.metadata.entry_points(group='console_scripts', name='pip') if ep.dist.name == 'pip')[0].load()
Like mentioned above, my main issue with it is that dist is allowed as a kwarg to filter for (as opposed to using some arbitrary other kwarg which raises an exception), implying that it can be used to filter by distribution name or at least filter with a Distribution object. It is fairly easy to work around this once one has figured out whats going on, but it takes time debugging. I can just imagine that more people spend time trying to figure out what is happening and that dist kwarg right now is just unusable as a filter.
It's been quite a year, and I'm just now digging deep enough into my emails to follow up on this. Sorry for the delay.
I can't recall, but I may have failed to connect the dots on the PR until just now. Thanks for that proposal.
Thinking about the use-case, I believe the recommended approach for getting entry points for a specific distribution would be done like so:
importlib.metadata.distribution('pip').entry_points.select(name='pip', group='console_scripts')
e.g.:
🐚 py -c "import importlib.metadata; ep, = importlib.metadata.distribution('pip').entry_points.select(name='pip', group='console_scripts'); print(ep)"
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
my main issue with it is that
distis allowed as a kwarg to filter
I think I agree we should update the guidance or maybe issue warnings to reduce the risk of this unintended use, though I'd also consider adding support for it if we can figure out how richly to do so.
Another option, now that I think about it, is to simply query for the "pip" "console_scripts":
importlib.metadata.entry_points().select(name='pip', group='console_scripts')
And not bother filtering by distribution at all, since it's unlikely another package would implement the pip console script (and if they did, you have bigger problems).