commitizen icon indicating copy to clipboard operation
commitizen copied to clipboard

Exceptions when using cz from ISO8859-1 Terminal

Open keenonkites opened this issue 2 years ago • 4 comments

Description

When issuing 'cz info' from a ISO8859-1 encoded Terminal I get the following exception:

[xx@XXXXXX:~/tmp/test-git-repo (master +)] $ cz info
Traceback (most recent call last):
  File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 570, in main
    args.func(conf, arguments)()
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/info.py", line 13, in __call__
    out.write(self.cz.info())
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 13, in write
    print(value, *args)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 1068: ordinal not in range(256)

I was running into that problem on MacOS (Ventura 13.6.3) and AlmaLinux 8.9.

Same exception also happens when issuing 'cz init' quite at the end of the process:

[xx@xxxxxxxx:~/tmp/test-git-repo (master +)] $ cz init
Welcome to commitizen!

Answer the questions to configure your project.
For further configuration visit:

https://commitizen-tools.github.io/commitizen/config/

? Please choose a supported config file:  .cz.toml
? Please choose a cz (commit rule): (default: cz_conventional_commits) cz_conventional_commits
? Choose the source of the version: commitizen: Fetch and set version in commitizen config (default)
No Existing Tag. Set tag to v0.0.1
? Choose version scheme:  semver
? Please enter the correct version format: (default: "$version")
? Create changelog automatically on bump Yes
? Keep major version zero (0.x) during breaking changes Yes
? What types of pre-commit hook you want to install? (Leave blank if you don't want to install) done

You can bump the version running:

	cz bump

Traceback (most recent call last):
  File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 570, in main
    args.func(conf, arguments)()
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/init.py", line 150, in __call__
    out.success("Configuration complete \U0001f680")
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 28, in success
    line(message)
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 18, in line
    print(value, *args, **kwargs)
UnicodeEncodeError: 'latin-1' codec can't encode character '\U0001f680' in position 28: ordinal not in range(256)

This probably happens also with other non UTF-8 encoded Terminals.

We have quite a few machines that still run (due to the application running on them) in ISO8859-1 Encoding

Steps to reproduce

  1. Start new Terminal
  2. Set LANG to 'de_CH.ISO8859-1' with 'export LANG=de_CH.ISO8859-1'
  3. issue 'cz info'

Current behavior

Exception / Crash

Desired behavior

optimum: Proper output with reencoding also with other Terminal Encodings than UTF-8. (This is due the fact that we have to use ISO8859-1 Terminal for development and therefore also use git and cz inside these terminals)

Acceptable but not desireable: Prevent cz from being used with non UTF-8 Termial: warn and exit at start instead of throwing an exception in the middle of the work.

Screenshots

No response

Environment

cz version --report Commitizen Version: 3.13.0 Python Version: 3.11.6 (main, Oct 3 2023, 17:06:54) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] Operating System: Linux

keenonkites avatar Jan 09 '24 07:01 keenonkites

Hi @keenonkites , thanks for filing this issue. I just tested with the following commands without encountering issues.

export LANG=de_CH.ISO8859-1
cz info

I also tried to install 3.13.0 and change terminal encoding. Could you please check if it still happens? If so, could you please provider another way to reproduce? Thanks!

Lee-W avatar May 20 '24 20:05 Lee-W

I've just tested it on my mac (where cz is installed via brew) as well as on a AlmaLinux VM (where cz is installed via pip) with the newest version 3.26.0 and it still happens on both systems.

to be mentioned facts:

  • both systems have ISO8559-1 as system setting
  • it does throw an exception when I start a terminal with LANG=de_CH.ISO8859-1
  • it works if I start a terminal with LANG=de_CH.UTF-8
  • if does throw an exception when I start a UTF8 Terminal and change to export LANG=de_CH.ISO8859-1 before issuing cz info

Below you see result of the third version (UTF8 Term, cz info, changing Lang, czinfo:

[xx@yyyyy:~] $ echo $LANG
de_CH.UTF-8
[xx@yyyyy:~] $ cz info
The commit contains the following structural elements, to communicate
intent to the consumers of your library:

fix: a commit of the type fix patches a bug in your codebase
(this correlates with PATCH in semantic versioning).

feat: a commit of the type feat introduces a new feature to the codebase
(this correlates with MINOR in semantic versioning).

BREAKING CHANGE: a commit that has the text BREAKING CHANGE: at the beginning of
its optional body or footer section introduces a breaking API change
(correlating with MAJOR in semantic versioning).
A BREAKING CHANGE can be part of commits of any type.

Others: commit types other than fix: and feat: are allowed,
like chore:, docs:, style:, refactor:, perf:, test:, and others.

We also recommend improvement for commits that improve a current
implementation without adding a new feature or fixing a bug.

Notice these types are not mandated by the conventional commits specification,
and have no implicit effect in semantic versioning (unless they include a BREAKING CHANGE).

A scope may be provided to a commit’s type, to provide additional contextual
information and is contained within parenthesis, e.g., feat(parser): add ability to parse arrays.

<type>[optional scope]: <description>

[optional body]

[optional footer]

[xx@yyyyy:~] $ export LANG=de_CH.ISO8859-1
[xx@yyyyy:~] $ cz info
Traceback (most recent call last):
  File "/home/user/pb/venv_commitizen/bin/cz", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/cli.py", line 607, in main
    args.func(conf, arguments)()
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/commands/info.py", line 13, in __call__
    out.write(self.cz.info())
  File "/home/user/pb/venv_commitizen/lib/python3.11/site-packages/commitizen/out.py", line 13, in write
    print(value, *args)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 1068: ordinal not in range(256)[xx@yyyyy:~] $

I can't think of anything else how to produce it as for me it's more an 'avoiding' the problem, but question from my end: Do you have de_CH.ISO8859-1 installed on your test system ? If the local is not installed you probably fall back automatically to UTF8 anyway.

keenonkites avatar May 21 '24 10:05 keenonkites

In the case of issuing the command cz info the crash is caused by the single quote on line 24 from "commit’s" in the file commitizen/cz/conventional_commits/conventional_commits_info.txt.

Taking this out from the source code prevents cz from crashing for 'cz info' with non-utf terminals. But there are other non-ascii characters in other sections also that crashes cz with other commands (cz init, e.g., as mentioned in the original posting).

To make the application non-utf save I think the functions for producing the output in the file commitizen/out.py have to be written in a way that allows masking/rewriting characters that are not safe for the actual encoding... or at least prevent crashes and print out proper error messages.

keenonkites avatar May 21 '24 12:05 keenonkites

Thanks for updating! Will take a deeper look after I come back.

Lee-W avatar May 21 '24 14:05 Lee-W