`Version` and `Specifier` accept (erroneously) some non-ASCII letters in the *local version* segment
Reproducing the behavior concerning packaging.version.Version:
Python 3.9.7 (default, Oct 4 2021, 18:09:29)
[...]
>>> import packaging.version
>>> packaging.version.Version('1.2+\u0130\u0131\u017f\u212a')
<Version('1.2+i̇ıſk')>
The cause is that packaging.version.VERSION_PATTERN makes use of a-z character ranges in conjunction with re.IGNORECASE and (implicit in Python 3.x) re.UNICODE (see the 2nd paragraph of this fragment: https://docs.python.org/3/library/re.html#re.IGNORECASE).
It can be fixed in one of the following two ways:
-
either by adding
re.ASCIIto flags (but then both occurrences of\s*in the actual regex will be restricted to match ASCII-only whitespace characters!); -
or by removing
re.IGNORECASEfrom flags and replacing (inVERSION_PATTERN) both occurrences ofa-zwithA-Za-zplus adding suitable upper-case alternatives in thepre_l,post_landdev_lregex groups, e.g.,[aA][lL][pP][hH][aA]in place ofalpha(quite cumbersome...).
The whitespace issue can probably be worked around by doing that detection separately from the actual parsing.
class Version:
_regex = re.compile(VERSION_PATTERN, re.VERBOSE | re.IGNORECASE | re.ASCII)
def __init__(self, version: str) -> None:
match = self._regex.match(version.strip())
# ... The rest is the same ...
Reproducing the behavior concerning packaging.specifiers.Specifier:
Python 3.9.7 (default, Oct 4 2021, 18:09:29)
[...]
>>> import packaging.specifiers
>>> packaging.specifiers.Specifier('==1.2+\u0130\u0131\u017f\u212a')
<Specifier('==1.2+İıſK')>
The cause is the same as the aforementioned Version-related one, except that here it relates to (non-public) Specifier._regex_str and Specifier._regex regular expression definitions.
In the case of these regexes, \s occurs in various places of the effective pattern (not only at its start and end), so here the solution based on re.ASCII in conjunction with the .strip()-based value preparation (proposed above by @uranusjr in regard to Version) cannot be applied without restricting matching of white space characters (including the negated character range in the case of the === operator...) to ASCII-only ones; which -- I suppose -- would be too disruptive.
Instead of that, an additional check can be performed when the operator is not '===' -- something along the lines of:
without_whitespace = ''.join(spec.split())
if not without_whitespace.isascii():
raise InvalidSpecifier...