commitizen_emoji cannot write to CHANGELOG
Description
This is because commitizen_emoji uses unicode emoji characters, but commitizen does not specify the file encoding in write_changelog(), so it defaults back to cp1252:
https://github.com/commitizen-tools/commitizen/blob/79165119817cbd0aef25434a85dcbffa3a13acbb/commitizen/commands/changelog.py#L92
This could be fixed by specifying the utf-8 encoding:
with open(self.file_name, "w", encoding='utf-8') as changelog_file:
...
Steps to reproduce
> cz bump --changelog
bump: version 0.4.0 → 0.5.0
tag to create: 0.5.0
increment detected: MINOR
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\hendra11\Code\external\cz-emoji\.venv\Scripts\cz.exe\__main__.py", line 7, in <module>
File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\cli.py", line 382, in main
args.func(conf, vars(args))()
File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\commands\bump.py", line 215, in __call__
changelog_cmd()
File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\commands\changelog.py", line 172, in __call__
self.write_changelog(changelog_out, lines, changelog_meta)
File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\commands\changelog.py", line 103, in write_changelog
changelog_file.write(changelog_out)
File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f389' in position 29: character maps to <undefined>
Current behavior
See Steps to reproduce
Desired behavior
Emojis can be logged to changelog
Screenshots
No response
Environment
- cz version: 2.27.0
- python version: 3.8.13
- platform.system(): 'Windows'
Maybe we could make the encoding configurable?
'utf-8' should cover all encodings, but I'm fine with that. You'd have to change BaseCommitizen to grab that config variable and everywhere you have an open used in your codebase, change it to add the encoding.
I seen the same issue on my end, and adding the encoding='utf-8' fixed it.
Hi @adam-grant-hendry we've release a new version of commitizen. Could you please try whether 2.30.0 solve this issue?
After I test it, the problem still exists.
@Lee-W The problem still persists. I tried on 2.35.0.
You need to specify the utf-8 encoding where open() is used. That is how this can be fixed and it was additionally confirmed by @fabiopedrosa
@Lee-W I remember now why I never submitted a PR: I am on Windows and your scripts/test and scripts/format expects /bin/sh, so I can't pass the pre-commit step linter and test.
This can be solved by using Git Bash: Git for Windows ships with git bash (i.e. a sh.exe and bash.exe).
This is another issue and requires another PR to fix.
This can be solved by using Git Bash: Git for Windows ships with git bash (i.e. a sh.exe and bash.exe).
Actually, even easier than that: Just switch to using a portable shebang:
#!/usr/bin/env sh
Hi @adam-grant-hendry , I notice you create another issue regarding to it. If pre-commit does not work, could you please use git commit ... --no-verify and git push ... --no-verify as a temporary solution before we can sort the /bin/sh issue out?
It would be even more helpful if you could document this on the doc for windows users. Thanks!
@woile @Lee-W I have a working implementation with passing unit tests for Unicode characters (that is, emojis). Once PR #605 is merged, I'll push it up.
It would be even more helpful if you could document this on the doc for windows users. Thanks!
Yes, great idea. I'll make another PR for docs. I can probably also resolve Issue #515 there as well. I spewed a bunch of comments there and have had some time to learn and practice, so I might clean that up a little bit and clarify things.