UnicodeDecodeError in Python3
This is an issue with thrift (dependency of this library), an open issue is already filed to that project.
Environment:
- Operating System: Windows 10 Pro (Simplified Chinese)
- Python Interpreter: Python 3.6.6
-
osqueryVersion: 3.3.0 -
osquery-pythonVersion: 3.0.5
When querying, UnicodeDecodeError raised with error message: "'utf-8' codec can't decode byte 0xc3 in position 0: invalid continuation byte" from thrift.compat.binary_to_str, which is because the encoding of bin_val parameter should be "gbk".
Maybe try hacking the source code of thrift and include it as a vendor package when distribution? (just as pipenv and other projects do)
@jarryshaw, did you have a chance to follow up on the comments on the Thrift bug report?
It's been quite a long time ago and I'm trying to reproduce the issue recently. Btw, I just found two other issues 🤦♂ I'll make a pull request on one of them.
- issue #57
- pull #70
Also, FYI, you can find the failed query at THRIFT-4677.
It should be linked to Windows internal issue. Some of the Chinese contexts are encoded with utf8, such as os_version, whilst some of them are encoded with system legacy encoding (cp936/gbk/gb2312 in my case), for example scheduled_tasks.
Also, according to James, contributor of Thrift, "Thrift only handles strings as UTF8 internally." Maybe this is some issue related to osquery internal data schema or some design fraud with Thrift.