hatch icon indicating copy to clipboard operation
hatch copied to clipboard

Build breaks for non-ascii emails

Open EthanRosenthal opened this issue 2 years ago • 5 comments

hatch build breaks if the authors section of pyproject.toml contains email addresses with non-ascii characters.

To reproduce, create a new project with hatch new, replace the authors email with non-ascii characters (e.g. Σ@Σ.com), and then run hatch build. The following error message results:

$ hatch build
[sdist]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/__main__.py", line 6, in <module>
    sys.exit(hatchling())
             ^^^^^^^^^^^
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/cli/__init__.py", line 26, in hatchling
    command(**kwargs)
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/cli/build/__init__.py", line 75, in build_impl
    for artifact in builder.build(
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/builders/plugin/interface.py", line 93, in build
    self.metadata.validate_fields()
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/metadata/core.py", line 244, in validate_fields
    self.core.validate_fields()
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/metadata/core.py", line 1325, in validate_fields
    getattr(self, attribute)
  File "/home/er/.local/share/hatch/env/virtual/tmp/R8904DxC/tmp-build/lib/python3.11/site-packages/hatchling/metadata/core.py", line 818, in authors
    authors_data['email'].append(str(Address(display_name=name, addr_spec=email)))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/er/.pyenv/versions/3.11.2/lib/python3.11/email/headerregistry.py", line 49, in __init__
    raise a_s.all_defects[0]
email.errors.NonASCIILocalPartDefect: local-part contains non-ASCII characters)

I think that this could be fixed by passing in username and domain separately to email.headerregistry.Address here rather than addr_spec, since addr_spec requires a properly encoded string.

EthanRosenthal avatar Sep 12 '23 02:09 EthanRosenthal

Thanks! Unfortunately the fix cannot be the one you describe because this is the accurate behavior per the spec https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers

Are email addresses allowed to contain non-ASCII characters?

ofek avatar Sep 12 '23 02:09 ofek

I honestly don't know much about any of this, but a Google search tells me that, apparently, email addresses can contain non-ascii characters as per RFC6532.

Why do you say that this is the accurate behavior per the spec? I'm not sure I quite follow.

EthanRosenthal avatar Sep 12 '23 03:09 EthanRosenthal

(FWIW, I ran into this issue teaching a class to students who have non-ascii characters in their email addresses. We're using rye for package management, and rye uses hatch by default. The students' packages end up failing to build due to this non-ascii issue. I've had the students switch to using pdm as their build system for the time being because it seems to handle non-ascii characters.)

EthanRosenthal avatar Sep 12 '23 03:09 EthanRosenthal

@EthanRosenthal , relating to rye (and sorry if this is spam to the hatch maintainers), I think the title of this issue can be modified. I've noticed a similar issue where the email address is literally (none), which is all ASCII.

As far as I can tell, pyproject.toml is generated by reading a user's ~/.gitconfig.

For instance, the following doesn't work:

my-computer:thingy b-long$ cat ~/.gitconfig |grep none
	email = (none)
my-computer:thingy b-long$ cat pyproject.toml |grep none
    { name = "b-long", email = "(none)" }

It'll fail when I run rye sync, like this:

my-computer:thingy b-long$ rye sync
Initializing new virtualenv in /Users/b-long/Desktop/github/b-long/thingy/.venv
Python version: [email protected]
Generating production lockfile: /Users/b-long/Desktop/github/b-long/thingy/requirements.lock
Generating dev lockfile: /Users/b-long/Desktop/github/b-long/thingy/requirements-dev.lock
Installing dependencies
Resolved 1 package in 5ms
error: Failed to prepare distributions
  Caused by: Failed to fetch wheel: thingy @ file:///Users/b-long/Desktop/github/b-long/thingy
  Caused by: Failed to build: `thingy @ file:///Users/b-long/Desktop/github/b-long/thingy`
  Caused by: Build backend failed to build wheel through `build_editable()` with exit status: 1
--- stdout:

--- stderr:
Traceback (most recent call last):
  File "<string>", line 11, in <module>
  File "/Users/b-long/Library/Caches/uv/builds-v0/.tmpJeUc14/lib/python3.12/site-packages/hatchling/build.py", line 83, in build_editable
    return os.path.basename(next(builder.build(directory=wheel_directory, versions=['editable'])))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b-long/Library/Caches/uv/builds-v0/.tmpJeUc14/lib/python3.12/site-packages/hatchling/builders/plugin/interface.py", line 90, in build
    self.metadata.validate_fields()
  File "/Users/b-long/Library/Caches/uv/builds-v0/.tmpJeUc14/lib/python3.12/site-packages/hatchling/metadata/core.py", line 266, in validate_fields
    self.core.validate_fields()
  File "/Users/b-long/Library/Caches/uv/builds-v0/.tmpJeUc14/lib/python3.12/site-packages/hatchling/metadata/core.py", line 1376, in validate_fields
    getattr(self, attribute)
  File "/Users/b-long/Library/Caches/uv/builds-v0/.tmpJeUc14/lib/python3.12/site-packages/hatchling/metadata/core.py", line 846, in authors
    authors_data['email'].append(str(Address(display_name=name, addr_spec=email)))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b-long/.rye/py/[email protected]/lib/python3.12/email/headerregistry.py", line 43, in __init__
    a_s, rest = parser.get_addr_spec(addr_spec)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b-long/.rye/py/[email protected]/lib/python3.12/email/_header_value_parser.py", line 1653, in get_addr_spec
    token, value = get_local_part(value)
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/b-long/.rye/py/[email protected]/lib/python3.12/email/_header_value_parser.py", line 1461, in get_local_part
    raise errors.HeaderParseError(
email.errors.HeaderParseError: expected local-part but found ''
---
error: Installation of dependencies failed in venv at /Users/b-long/Desktop/github/b-long/thingy/.venv. uv exited with status: exit status: 2

If I manually edit the pyproject.toml, I can successfully run rye sync:

my-computer:thingy b-long$ cat pyproject.toml |grep email
    { name = "b-long", email = "[email protected]" }
my-computer:thingy b-long$ rye sync
Reusing already existing virtualenv
Generating production lockfile: /Users/b-long/Desktop/github/b-long/thingy/requirements.lock
Generating dev lockfile: /Users/b-long/Desktop/github/b-long/thingy/requirements-dev.lock
Installing dependencies
Resolved 1 package in 4ms
   Built thingy @ file:///Users/b-long/Desktop/github/b-long/thingy
Prepared 1 package in 397ms
Installed 1 package in 3ms
 + thingy==0.1.0 (from file:///Users/b-long/Desktop/github/b-long/thingy)
Done!
my-computer:thingy b-long$ 

My rye version is below:

 # rye --version
rye 0.37.0
commit: 0.37.0 (09b67c469 2024-07-20)
platform: macos (x86_64)
self-python: [email protected]
symlink support: true
uv enabled: true

I appreciate that this issue is still open, and I'm wondering about improving the UX of rye/hatchling 🤔 Right now, I guess hatchling fails fast if there's an invalid email. Would it be appropriate, for (rye or hatchling) to omit writing an invalid email when it creates the pyproject.toml ?

If that were the behavior, instead of failing the first time a user runs rye sync perhaps the failure can be delayed to a later point in using these tools or perhaps the error could be a message about providing a good email address rather than a stacktrace?

b-long avatar Jul 31 '24 01:07 b-long

On one hand, it seems like simply not writing non-ascii or invalid email addresses would be the simplest quick fix. On the other hand, it seems in poor taste to omit the email address for anybody with a non-ascii email address, as opposed to fixing the implementation to support non-ascii emails (which I think is doable in the way I reference in my original message). Probably both ought to happen -- invalid email addresses don't get written, and non-ascii email addresses get supported.

EthanRosenthal avatar Aug 13 '24 02:08 EthanRosenthal