osquery-python icon indicating copy to clipboard operation
osquery-python copied to clipboard

UnicodeDecodeError in Python3

Open JarryShaw opened this issue 7 years ago • 3 comments

This is an issue with thrift (dependency of this library), an open issue is already filed to that project.

Environment:

  • Operating System: Windows 10 Pro (Simplified Chinese)
  • Python Interpreter: Python 3.6.6
  • osquery Version: 3.3.0
  • osquery-python Version: 3.0.5

When querying, UnicodeDecodeError raised with error message: "'utf-8' codec can't decode byte 0xc3 in position 0: invalid continuation byte" from thrift.compat.binary_to_str, which is because the encoding of bin_val parameter should be "gbk".

Maybe try hacking the source code of thrift and include it as a vendor package when distribution? (just as pipenv and other projects do)

JarryShaw avatar Nov 30 '18 13:11 JarryShaw

@jarryshaw, did you have a chance to follow up on the comments on the Thrift bug report?

theopolis avatar Aug 06 '19 12:08 theopolis

It's been quite a long time ago and I'm trying to reproduce the issue recently. Btw, I just found two other issues 🤦‍♂ I'll make a pull request on one of them.

  • issue #57
  • pull #70

JarryShaw avatar Aug 10 '19 08:08 JarryShaw

Also, FYI, you can find the failed query at THRIFT-4677.

It should be linked to Windows internal issue. Some of the Chinese contexts are encoded with utf8, such as os_version, whilst some of them are encoded with system legacy encoding (cp936/gbk/gb2312 in my case), for example scheduled_tasks.

Also, according to James, contributor of Thrift, "Thrift only handles strings as UTF8 internally." Maybe this is some issue related to osquery internal data schema or some design fraud with Thrift.

JarryShaw avatar Aug 10 '19 09:08 JarryShaw