Bug: codevalidator -f crashes and leaves my file empty when using non-ASCII characters
Summary
When a YAML file contains a character outside of ASCII as well as trailing whitespaces,
codevalidator -f filename will crash with an (not that useful) error message and leave the file empty, at the same time not even creating a backup copy.
Fortunately I did a git add beforehand.
How to reproduce
I'm using codevalidator 0.8.2, judging from pip show codevalidator (codevalidator itself doesn't have a --version option).
Here is an example file (encoded as UTF-8):
definitions:
purchase_order:
type: object
description: |
An either sparse or complete representation of a purchase order.
(TODO: The definition here is not yet complete – e.g. positions
are missing.)
This has a trailing space in line 5, and an – (en-dash) in line 6. I guess the latter causes codevalidator to crash, the former makes it try to correct at all.
Here is the output for this file:
$ codevalidator -v -f backend/src/main/resources/api/swagger-purchase-order.yaml
backend/src/main/resources/api/swagger-purchase-order.yaml: contains lines with trailing whitespace
backend/src/main/resources/api/swagger-purchase-order.yaml: Trying to fix notrailingws..
Traceback (most recent call last):
File "/usr/local/bin/codevalidator", line 9, in <module>
load_entry_point('codevalidator==0.8.2', 'console_scripts', 'codevalidator')()
File "/usr/local/lib/python2.7/dist-packages/codevalidator.py", line 953, in main
fix_files()
File "/usr/local/lib/python2.7/dist-packages/codevalidator.py", line 878, in fix_files
fix_file(fname, rules)
File "/usr/local/lib/python2.7/dist-packages/codevalidator.py", line 866, in fix_file
fd.write(fixed.encode())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 191: ordinal not in range(128)
No .pre-cvfix file is created in this case (but one is created if I replace the – by -).
It looks like it is using 'ascii' as the codec instead of an unicode one.
Expected behaviour
Codevalidator should be able to handle UTF-8 encoded files.
Even if it is not able to fix my file, it should certainly not destroy it (overwriting it with an empty file).
Even if it does that, it should create a backup copy (unless being told not to by --no-backup).
Workaround Make sure only ASCII characters are used in files passed to codevalidator. And if you are not sure, make a copy of the file beforehands.
Here is my ~/.codevalidatorrc. I think nothing special is in there.
{
"exclude_dirs": [".svn", ".git", "xplan", "live-image", "calendar_connector", "archetype"],
"rules": {
"*.c": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.coffee": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "coffeelint"],
"*.conf": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.cpp": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.css": ["utf8", "nobom", "notabs", "nocr"],
"*.groovy": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.h": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.html": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.htm": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.java": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "jalopy"],
"*.json": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "json"],
"*.jsp": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.js": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.less": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.md": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.php": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.phtml": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*pom.xml": ["xml", "pomdesc"],
"*.pp": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "puppet"],
"*.properties": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.py": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "pyflakes", "pythontidy"],
"*.rst": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.scala": ["invalidpath"],
"* *": ["invalidpath"],
"*.sh": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.sql_diff": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.sql": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.styl": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.txt": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.vm": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.wsdl": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.xml": ["utf8", "nobom", "notabs", "nocr", "notrailingws"],
"*.yml": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "yaml"],
"*.yaml": ["utf8", "nobom", "notabs", "nocr", "notrailingws", "yaml"]
},
"options": {"phpcs": {"standard": "PSR", "encoding": "UTF-8"},
"database_dir": {"pgsql-parser-bin": "/bin/true"},
"jalopy": {"classpath": "/home/paulo/tools/jalopy/lib/jalopy-1.9.4.jar://home/paulo/tools/jalopy/lib/jh.jar"}},
"dir_rules": {"db_diffs": ["sql_diff_dir", "sql_diff_sql"], "database": ["database_dir"]}
}
I just tried this again on my home computer with a fresh clone of the repository, and it does create a backup copy in the .pre-cvfix file. I'll have to check again why this didn't happen at my work installation.