UnicodeEncodeError on Windows when there are Unicode chars in the help message
I have come across an error when I try to print the help message for my app (--help) on Windows (using bash, cmd, and powershell). My help message has unicode characters in it (the project name) which is what seems to be causing the problem:
- https://github.com/leouieda/nene/blob/main/nene/cli.py#L72
This PR tests running the app with --help and it fails on Windows and Python 3.6 and 3.10: https://github.com/leouieda/nene/pull/12
Here is a minimum example that fails:
# example.py
import click
@click.command(context_settings={"help_option_names": ["-h", "--help"]})
def main():
"""
App description with Unicode ‣
"""
pass
if __name__ == '__main__':
main()
$ python example.py -h
Traceback (most recent call last):
File "example.py", line 11, in <module>
main()
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1052, in main
with self.make_context(prog_name, args, **extra) as ctx:
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 914, in make_context
self.parse_args(ctx, args)
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1370, in parse_args
value, args = param.handle_parse_result(ctx, opts, args)
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 2347, in handle_parse_result
value = self.process_value(ctx, value)
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 2309, in process_value
value = self.callback(ctx, self, value)
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\core.py", line 1270, in show_help
echo(ctx.get_help(), color=ctx.color)
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\site-packages\click\utils.py", line 298, in echo
file.write(out) # type: ignore
File "C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2023' in position 62: character maps to <undefined>
I can confirm that it's the Unicode characters in the docstring of the function wrapped with the main @click.command that causes the issue. Removing them fixes the problem (the second CI run on https://github.com/leouieda/nene/pull/12). This issue does not happen on Linux and Mac.
For now, I'll remove the unicode characters so I'm not pushing a broken package but it would be great to be able to include the proper spelling of the package name in the future.
Environment:
- Python version: 3.6 and 3.10
- Click version: 8.0.3
Please include a minimal reproducible example in the issue itself. Links to projects can be helpful, but it's much easier for contributors and maintainers to address a bug here instead of there.
Sorry about that. I'll edit the description with an example. Just trying to run it on CI to see if it really breaks.
Done.
I believe this can be closed, as it is not an issue caused by click. I wrote up an explanation on another issue and a gist but the tldr is that this is caused by the Windows agent redirecting command output to a file and the default locale code page not being Unicode compatible. While click may be able to solve for this, it is definitely not caused by click.
The file path in the traceback, C:\hostedtoolcache\windows\Python\3.6.8\x64\, suggests that this issue is being reported about runs in an Azure Windows agent. It sounds like this is an issue with the behavior of the agent, not Click.
This was reported to me by a user on Windows and I tested on GitHub Actions since I don't have access to a Windows machine for testing. I'm sure if they were encountering this on Azure or on their own machine, though.
Here is my repro of the same error. For me this happens when running a click program inside "git bash". https://github.com/rudolfbyker/click-git-bash-unicode-repro
I can see how this is not caused by click, but maybe we could treat it as a feature request that click should work around this somehow?
I'm happy to review a PR that fixes the issue.
Also note my original comment:
Please include a minimal reproducible example in the issue itself. Links to projects can be helpful, but it's much easier for contributors and maintainers to address a bug here instead of there.
A few possible workarounds for those searching:
- Run your script with
python -X utf8 … - Set the
PYTHONIOENCODINGenvironment variable to utf8. - Run
sys.stdout.reconfigure(encoding="utf-8")andsys.stderr.reconfigure(encoding="utf-8")at the start of your script.
Depending on the situation, one or more of these could convince Python to use UTF-8 rather than CP1252.
fwiw, encountered this error for click.echo('├─') in the CI of https://github.com/ddelange/pipgrip/pull/128.
It's on Github Actions windows-latest runners, which will return sys.getfilesystemencoding() == 'utf-8', meaning it's running python in utf8 mode.
Somehow, click still goes into a cp1252 routine in that GHA environment...
Happy to review a PR.
could you point me to the point in code where we could set the output encoding based on sys.getfilesystemencoding(), such that these characters at least get printed on windows with python 3.7+ running in utf8 mode (PYTHONUTF8=1)?
or maybe https://docs.python.org/3/library/sys.html#sys.getdefaultencoding?
or some other way to get click to respect Python UTF-8 Mode?
hmm looks like utf-16? https://github.com/pallets/click/blob/ca5e1c3d75e95cbc70fa6ed51ef263592e9ac0d0/src/click/_winconsole.py#L229
why does the OP and my traceback go into cp1252.py in the first place? :thinking:
Here is my repro of the same error. For me this happens when running a click program inside "git bash". https://github.com/rudolfbyker/click-git-bash-unicode-repro
I can see how this is not caused by click, but maybe we could treat it as a feature request that click should work around this somehow?
as shown in that screenshot, it doesnt happen in every console. would be cool to support Github Actions windows-latest, but no idea how to find a possible detection/mediation technique here
fwiw, encountered this error for
click.echo('├─')in the CI of ddelange/pipgrip#128.It's on Github Actions
windows-latestrunners, which will returnsys.getfilesystemencoding() == 'utf-8', meaning it's running python in utf8 mode.Somehow, click still goes into a cp1252 routine in that GHA environment...
Feel free to review my gist covering this, but just a quick heads up that checking sys.getfilesystemencoding() won't necessarily be accurate. You're better off checking sys.stdout.encoding. If you haven't set PYTHONUTF8 or PYTHONIOENCODING in your pipeline yet I would try that before doing anything else.
would it make sense to simply catch this error in click.echo and re-raise it with a more verbose message?
try:
file.write(out)
except UnicodeEncodeError as exc:
if sys.flags.utf8_mode:
raise
msg = "Failed to echo some Unicode character. Try enabling [UTF-8 mode](https://docs.python.org/3/library/os.html#utf8-mode)."
raise UnicodeEncodeError(msg) from exc
@ddelange +1 I think that's a fantastic idea
If you think the error needs to be clearer, report that to python. They have been updating many errors in the last few releases.