Exceptions when using cz from ISO8859-1 Terminal
Description
When issuing 'cz info' from a ISO8859-1 encoded Terminal I get the following exception:
[xx@XXXXXX:~/tmp/test-git-repo (master +)] $ cz info
Traceback (most recent call last):
File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 570, in main
args.func(conf, arguments)()
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/info.py", line 13, in __call__
out.write(self.cz.info())
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 13, in write
print(value, *args)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 1068: ordinal not in range(256)
I was running into that problem on MacOS (Ventura 13.6.3) and AlmaLinux 8.9.
Same exception also happens when issuing 'cz init' quite at the end of the process:
[xx@xxxxxxxx:~/tmp/test-git-repo (master +)] $ cz init
Welcome to commitizen!
Answer the questions to configure your project.
For further configuration visit:
https://commitizen-tools.github.io/commitizen/config/
? Please choose a supported config file: .cz.toml
? Please choose a cz (commit rule): (default: cz_conventional_commits) cz_conventional_commits
? Choose the source of the version: commitizen: Fetch and set version in commitizen config (default)
No Existing Tag. Set tag to v0.0.1
? Choose version scheme: semver
? Please enter the correct version format: (default: "$version")
? Create changelog automatically on bump Yes
? Keep major version zero (0.x) during breaking changes Yes
? What types of pre-commit hook you want to install? (Leave blank if you don't want to install) done
You can bump the version running:
cz bump
Traceback (most recent call last):
File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 570, in main
args.func(conf, arguments)()
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/init.py", line 150, in __call__
out.success("Configuration complete \U0001f680")
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 28, in success
line(message)
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 18, in line
print(value, *args, **kwargs)
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001f680' in position 28: ordinal not in range(256)
This probably happens also with other non UTF-8 encoded Terminals.
We have quite a few machines that still run (due to the application running on them) in ISO8859-1 Encoding
Steps to reproduce
- Start new Terminal
- Set LANG to 'de_CH.ISO8859-1' with 'export LANG=de_CH.ISO8859-1'
- issue 'cz info'
Current behavior
Exception / Crash
Desired behavior
optimum: Proper output with reencoding also with other Terminal Encodings than UTF-8. (This is due the fact that we have to use ISO8859-1 Terminal for development and therefore also use git and cz inside these terminals)
Acceptable but not desireable: Prevent cz from being used with non UTF-8 Termial: warn and exit at start instead of throwing an exception in the middle of the work.
Screenshots
No response
Environment
cz version --report Commitizen Version: 3.13.0 Python Version: 3.11.6 (main, Oct 3 2023, 17:06:54) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] Operating System: Linux
Hi @keenonkites , thanks for filing this issue. I just tested with the following commands without encountering issues.
export LANG=de_CH.ISO8859-1
cz info
I also tried to install 3.13.0 and change terminal encoding. Could you please check if it still happens? If so, could you please provider another way to reproduce? Thanks!
I've just tested it on my mac (where cz is installed via brew) as well as on a AlmaLinux VM (where cz is installed via pip) with the newest version 3.26.0 and it still happens on both systems.
to be mentioned facts:
- both systems have ISO8559-1 as system setting
- it does throw an exception when I start a terminal with LANG=de_CH.ISO8859-1
- it works if I start a terminal with LANG=de_CH.UTF-8
- if does throw an exception when I start a UTF8 Terminal and change to export LANG=de_CH.ISO8859-1 before issuing cz info
Below you see result of the third version (UTF8 Term, cz info, changing Lang, czinfo:
[xx@yyyyy:~] $ echo $LANG
de_CH.UTF-8
[xx@yyyyy:~] $ cz info
The commit contains the following structural elements, to communicate
intent to the consumers of your library:
fix: a commit of the type fix patches a bug in your codebase
(this correlates with PATCH in semantic versioning).
feat: a commit of the type feat introduces a new feature to the codebase
(this correlates with MINOR in semantic versioning).
BREAKING CHANGE: a commit that has the text BREAKING CHANGE: at the beginning of
its optional body or footer section introduces a breaking API change
(correlating with MAJOR in semantic versioning).
A BREAKING CHANGE can be part of commits of any type.
Others: commit types other than fix: and feat: are allowed,
like chore:, docs:, style:, refactor:, perf:, test:, and others.
We also recommend improvement for commits that improve a current
implementation without adding a new feature or fixing a bug.
Notice these types are not mandated by the conventional commits specification,
and have no implicit effect in semantic versioning (unless they include a BREAKING CHANGE).
A scope may be provided to a commit’s type, to provide additional contextual
information and is contained within parenthesis, e.g., feat(parser): add ability to parse arrays.
<type>[optional scope]: <description>
[optional body]
[optional footer]
[xx@yyyyy:~] $ export LANG=de_CH.ISO8859-1
[xx@yyyyy:~] $ cz info
Traceback (most recent call last):
File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 607, in main
args.func(conf, arguments)()
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/info.py", line 13, in __call__
out.write(self.cz.info())
File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 13, in write
print(value, *args)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 1068: ordinal not in range(256)[xx@yyyyy:~] $
I can't think of anything else how to produce it as for me it's more an 'avoiding' the problem, but question from my end: Do you have de_CH.ISO8859-1 installed on your test system ? If the local is not installed you probably fall back automatically to UTF8 anyway.
In the case of issuing the command cz info the crash is caused by the single quote on line 24 from "commit’s" in the file commitizen/cz/conventional_commits/conventional_commits_info.txt.
Taking this out from the source code prevents cz from crashing for 'cz info' with non-utf terminals. But there are other non-ascii characters in other sections also that crashes cz with other commands (cz init, e.g., as mentioned in the original posting).
To make the application non-utf save I think the functions for producing the output in the file commitizen/out.py have to be written in a way that allows masking/rewriting characters that are not safe for the actual encoding... or at least prevent crashes and print out proper error messages.
Thanks for updating! Will take a deeper look after I come back.