gpg fails to find key due to overriden locale
Description
I am using GitPython to sign tags as such:
repo.git.tag(
"-s",
"v{}".format(str(new_version)),
"-m Version {}".format(str(new_version)),
)
This fails with the following error/exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/jgalar/EfficiOS/src/reml/reml/cli.py", line 93, in main
release = project.release(
File "/home/jgalar/EfficiOS/src/reml/reml/project.py", line 443, in release
self._commit_and_tag(new_version)
File "/home/jgalar/EfficiOS/src/reml/reml/lttngtools.py", line 41, in _commit_and_tag
self._repo.git.tag(
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/git/cmd.py", line 545, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/git/cmd.py", line 1011, in _call_process
return self.execute(call, **exec_kwargs)
File "/home/jgalar/.cache/pypoetry/virtualenvs/reml-06EzQDmo-py3.9/lib/python3.9/site-packages/git/cmd.py", line 828, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git tag -s v2.12.4 -m Version 2.12.4
stderr: 'error: gpg failed to sign the data
error: unable to sign the tag'
Debugging the problem
I was initially confused since I can sign tags correctly from the command line using the git tag -s ... command directly.
Digging a bit, I saw that git invokes gpg with the following arguments both when I tag using the gitclient directly or when using GitPython.
argv[0] = /usr/bin/gpg2
argv[1] = --status-fd=2
argv[2] = -bsau
argv[3] = Jérémie Galarneau <[email protected]>
This seemed to point to something more subtle, possibly related to the process's environment.
I couldn't find a way for git to provide the stderr of gpg to get a better error report. Thus, I modified GnuPG 2.2.27 and rebuilt it to output the errors to a log file. This yielded the following error:
[GNUPG:] INV_SGNR 9 Jérémie Galarneau <[email protected]>
[GNUPG:] FAILURE sign 17
gpg: signing failed: No secret key
If you look at the first line, you will see that the é characters in my name were changed to é.
This typically happens when my name is converted from UTF-8 to ISO/IEC 8859-1.
This clued me in that something funny related to locales was happening.
I dumped and compared the environment of the gpg process in the two scenarios (CLI use and GitPython) and saw that the only meaningful difference was that the LANGUAGE and LC_ALL environment variables were set to C when GitPython was involved.
Indeed, invoking git with LC_ALL="C" LANGUAGE="C" git tag -s [...] reproduced the problem.
Cause
Looking at the GitPython code, I found that git is invoked with those environment variables set:
https://github.com/gitpython-developers/GitPython/blame/b3778ec/git/cmd.py#L694
I am unsure what "parsing code" the comments are referring to so I can't comment on the reasons why this is done. However, forcing a C locale will cause these kinds of erroneous encoding conversions for people who, like me, have non-ASCII names.
For what it's worth, my locale is LANG=en_CA.utf8.
I would guess that forcing the locale to any UTF-8 English locale would work around most issues and still provide GitPython with an English output.
My workaround
I found out that setting the signingKey property to my KEYID in my .gitconfig causes git to invoke gpg with the KEYID instead of the name property.
[user]
name = Jérémie Galarneau
email = [email protected]
signingKey = MY_KEY_ID_HERE
This works around the problem since no accented letters are used in the gpg invocation.
I am absolutely amazed by this issue and write up which is nothing short of an exciting detective story - thanks for that!
GitPython should definitely not enforce ASCII anywhere even though it requires an English locale for parsing its output, and I would hope you will find the time to submit a PR implementing the suggestion provided here:
I would guess that forcing the locale to any UTF-8 English locale would work around most issues and still provide GitPython with an English output.
It would be a good opportunity to embed Canada (加拿大) literally into GitPython's codebase :).
I would hope you will find the time to submit a PR implementing the suggestion provided here
Sure, I'm glad to see that this isn't a non-starter :) I'm just not sure what fix you would find suitable/clean enough.
It would be a good opportunity to embed Canada (加拿大) literally into GitPython's codebase :).
I'm afraid few users will have en_CA.utf8 available. The good thing is that if a locale is not available, libc falls back to the default C, which at least doesn't break more things.
I would propose we choose a locale that we think will be available to most users and rely on falling-back to C in the rare cases where it isn't. In theory C.UTF-8 would be a good fit, but it's not available everywhere (Manjaro and Arch Linux come to mind).
In my experience, en_US.utf8 is pretty widely available, but I'm not sure what is typically available on Windows, macOS, and the various BSDs. I can look into it a bit more.
Otherwise, we can also go all-out and list the locales on the system and look for one that matches en_..\.utf8, but I'm not sure it's worth it.
What do you think?
Thanks for sharing your insight - if I remember correctly what happened…erm…a decade ago I was just trying our locales to find one that works everywhere, with UTF-8 not being anything I would know or be concerned about 😅.
It sounds like C.UTF-8 would be preferable, but a quick check revealed that at least on MacOS it's not available. en_US.UTF-8 is though, along with many other english speaking countries.
If libc indeed falls back to C it should be safe to go with en_US.UTF-8, otherwise it might be worth to check available locales using the seemingly available locale python module.
Maybe you could try FOO.UTF-8 and see if it does indeed work as expected just to be sure.
Thanks again!