Unicode issue on Powershell
When passing unicode characters to gitlint via stdin in Powershell, gitlint will not properly print out the unicode characters.
echo "WIP: foöbar" | gitlint
1: T5 Title contains the word 'WIP' (case-insensitive): "WIP: fo?bar"
3: B6 Body message is missing
This does work as expected in the regular Windows Command Prompt, so this seems related to Powershell specifically.
I was testing gitlint and thought I'd check this bug given it's easy to reproduce. Here's what I found out (based on this stackoverflow question):
minimal code to reproduce on py38 using echo "WIP: foöbar" | python read_from_stdin.py
if __name__ == "__main__":
import sys
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp1252'>
>>> input_data='WIP: fo?bar\n'
forcing stdin encoding to utf8. Issue still there
if __name__ == "__main__":
import sys
sys.stdin.reconfigure(encoding="utf8")
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf8'>
>>> input_data='WIP: fo?bar\n'
forcing stdin encoding to utf8 and running $OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false) before echo "WIP: foöbar" | python read_from_stdin.py. Everything looks good.
if __name__ == "__main__":
import sys
sys.stdin.reconfigure(encoding="utf8")
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf8'>
>>> input_data='WIP: foöbar\n'
fyi, running $OutputEncoding = [Console]::OutputEncoding = (new-object System.Text.UTF8Encoding $false) before echo "WIP: foöbar" | python read_from_stdin.py but without sys.stdin.reconfigure(encoding="utf8")
if __name__ == "__main__":
import sys
input_data = sys.stdin.read()
print(f"{sys.stdin=}")
print(f"{input_data=}")
>>> sys.stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp1252'>
>>> input_data='WIP: foöbar\n'
seems like there is some value in forcing stdin encoding to utf8, but this is ultimately a powershell problem like you expected. So, I think you can close the issue.
Thanks for doing this extra legwork! I'll keep this open for next time I get around to digging into Unicode issues on windows :-)