click icon indicating copy to clipboard operation
click copied to clipboard

UnicodeEncodeError on Windows when there are Unicode chars in the help message

Open leouieda opened this issue 4 years ago • 20 comments

I have come across an error when I try to print the help message for my app (--help) on Windows (using bash, cmd, and powershell). My help message has unicode characters in it (the project name) which is what seems to be causing the problem:

  • https://github.com/leouieda/nene/blob/main/nene/cli.py#L72

This PR tests running the app with --help and it fails on Windows and Python 3.6 and 3.10: https://github.com/leouieda/nene/pull/12

Here is a minimum example that fails:

# example.py
import click

@click.command(context_settings={"help_option_names": ["-h", "--help"]})
def main():
    """
    App description with Unicode ‣
    """
    pass

if __name__ == '__main__':
    main()
$ python example.py -h
Traceback (most recent call last):
  File "example.py", line 11, in <module>
    main()
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1052, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 914, in make_context
    self.parse_args(ctx, args)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1370, in parse_args
    value, args = param.handle_parse_result(ctx, opts, args)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 2347, in handle_parse_result
    value = self.process_value(ctx, value)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 2309, in process_value
    value = self.callback(ctx, self, value)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1270, in show_help
    echo(ctx.get_help(), color=ctx.color)
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\utils.py", line 298, in echo
    file.write(out)  # type: ignore
  File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2023' in position 62: character maps to <undefined>

I can confirm that it's the Unicode characters in the docstring of the function wrapped with the main @click.command that causes the issue. Removing them fixes the problem (the second CI run on https://github.com/leouieda/nene/pull/12). This issue does not happen on Linux and Mac.

For now, I'll remove the unicode characters so I'm not pushing a broken package but it would be great to be able to include the proper spelling of the package name in the future.

Environment:

  • Python version: 3.6 and 3.10
  • Click version: 8.0.3

leouieda avatar Nov 01 '21 16:11 leouieda

Please include a minimal reproducible example in the issue itself. Links to projects can be helpful, but it's much easier for contributors and maintainers to address a bug here instead of there.

davidism avatar Nov 01 '21 16:11 davidism

Sorry about that. I'll edit the description with an example. Just trying to run it on CI to see if it really breaks.

leouieda avatar Nov 01 '21 17:11 leouieda

Done.

leouieda avatar Nov 01 '21 17:11 leouieda

I believe this can be closed, as it is not an issue caused by click. I wrote up an explanation on another issue and a gist but the tldr is that this is caused by the Windows agent redirecting command output to a file and the default locale code page not being Unicode compatible. While click may be able to solve for this, it is definitely not caused by click.

NodeJSmith avatar Nov 13 '22 16:11 NodeJSmith

The file path in the traceback, C:\hostedtoolcache\windows\Python\3.6.8\x64\, suggests that this issue is being reported about runs in an Azure Windows agent. It sounds like this is an issue with the behavior of the agent, not Click.

davidism avatar Nov 14 '22 14:11 davidism

This was reported to me by a user on Windows and I tested on GitHub Actions since I don't have access to a Windows machine for testing. I'm sure if they were encountering this on Azure or on their own machine, though.

leouieda avatar Nov 14 '22 14:11 leouieda

Here is my repro of the same error. For me this happens when running a click program inside "git bash". https://github.com/rudolfbyker/click-git-bash-unicode-repro

I can see how this is not caused by click, but maybe we could treat it as a feature request that click should work around this somehow?

rudolfbyker avatar Feb 03 '23 16:02 rudolfbyker

I'm happy to review a PR that fixes the issue.

davidism avatar Feb 03 '23 16:02 davidism

Also note my original comment:

Please include a minimal reproducible example in the issue itself. Links to projects can be helpful, but it's much easier for contributors and maintainers to address a bug here instead of there.

davidism avatar Feb 03 '23 16:02 davidism

A few possible workarounds for those searching:

  • Run your script with python -X utf8 …
  • Set the PYTHONIOENCODING environment variable to utf8.
  • Run sys.stdout.reconfigure(encoding="utf-8") and sys.stderr.reconfigure(encoding="utf-8") at the start of your script.

Depending on the situation, one or more of these could convince Python to use UTF-8 rather than CP1252.

rudolfbyker avatar Aug 24 '23 13:08 rudolfbyker

fwiw, encountered this error for click.echo('├─') in the CI of https://github.com/ddelange/pipgrip/pull/128.

It's on Github Actions windows-latest runners, which will return sys.getfilesystemencoding() == 'utf-8', meaning it's running python in utf8 mode.

Somehow, click still goes into a cp1252 routine in that GHA environment...

logs.txt

ddelange avatar Nov 13 '23 17:11 ddelange

Happy to review a PR.

davidism avatar Nov 13 '23 17:11 davidism

could you point me to the point in code where we could set the output encoding based on sys.getfilesystemencoding(), such that these characters at least get printed on windows with python 3.7+ running in utf8 mode (PYTHONUTF8=1)?

ddelange avatar Nov 13 '23 18:11 ddelange

or maybe https://docs.python.org/3/library/sys.html#sys.getdefaultencoding?

or some other way to get click to respect Python UTF-8 Mode?

ddelange avatar Nov 13 '23 18:11 ddelange

hmm looks like utf-16? https://github.com/pallets/click/blob/ca5e1c3d75e95cbc70fa6ed51ef263592e9ac0d0/src/click/_winconsole.py#L229

why does the OP and my traceback go into cp1252.py in the first place? :thinking:

ddelange avatar Nov 13 '23 18:11 ddelange

Here is my repro of the same error. For me this happens when running a click program inside "git bash". https://github.com/rudolfbyker/click-git-bash-unicode-repro

I can see how this is not caused by click, but maybe we could treat it as a feature request that click should work around this somehow?

as shown in that screenshot, it doesnt happen in every console. would be cool to support Github Actions windows-latest, but no idea how to find a possible detection/mediation technique here

ddelange avatar Nov 13 '23 18:11 ddelange

fwiw, encountered this error for click.echo('├─') in the CI of ddelange/pipgrip#128.

It's on Github Actions windows-latest runners, which will return sys.getfilesystemencoding() == 'utf-8', meaning it's running python in utf8 mode.

Somehow, click still goes into a cp1252 routine in that GHA environment...

logs.txt

Feel free to review my gist covering this, but just a quick heads up that checking sys.getfilesystemencoding() won't necessarily be accurate. You're better off checking sys.stdout.encoding. If you haven't set PYTHONUTF8 or PYTHONIOENCODING in your pipeline yet I would try that before doing anything else.

NodeJSmith avatar Nov 14 '23 02:11 NodeJSmith

would it make sense to simply catch this error in click.echo and re-raise it with a more verbose message?

try:
    file.write(out)
except UnicodeEncodeError as exc:
    if sys.flags.utf8_mode:
        raise
    msg = "Failed to echo some Unicode character. Try enabling [UTF-8 mode](https://docs.python.org/3/library/os.html#utf8-mode)."
    raise UnicodeEncodeError(msg) from exc

ddelange avatar Nov 14 '23 07:11 ddelange

@ddelange +1 I think that's a fantastic idea

NodeJSmith avatar Nov 14 '23 13:11 NodeJSmith

If you think the error needs to be clearer, report that to python. They have been updating many errors in the last few releases.

davidism avatar Nov 14 '23 13:11 davidism