Non-utf8 character causes crash when scanning
Describe the bug
Bandit fails and crashes (skipping file) when trying to decode/parse character that isn't utf-8.
xxd file that causes bug:
➜ xxd poc.py
00000000: 2320 5255 4e3a 2074 7275 650a 0a23 2048 # RUN: true..# H
00000010: 6572 6520 6973 2061 2073 7472 696e 6720 ere is a string
00000020: 7468 6174 2063 616e 6e6f 7420 6265 2064 that cannot be d
00000030: 6563 6f64 6564 2069 6e20 6c69 6e65 206d ecoded in line m
00000040: 6f64 653a 20c2 2e0a ode: ...
Execute bandit --debug pocFile.py
[main] DEBUG logging initialized
[main] INFO profile include tests: None
[main] INFO profile exclude tests: None
[main] INFO cli include tests: None
[main] INFO cli exclude tests: None
[test_set] DEBUG added function any_other_function_with_shell_equals_true (B604) targeting Call
[test_set] DEBUG added function assert_used (B101) targeting Assert
[test_set] DEBUG added function django_extra_used (B610) targeting Call
[test_set] DEBUG added function django_mark_safe (B703) targeting Call
[test_set] DEBUG added function django_rawsql_used (B611) targeting Call
[test_set] DEBUG added function exec_used (B102) targeting Call
[test_set] DEBUG added function flask_debug_true (B201) targeting Call
[test_set] DEBUG added function hardcoded_bind_all_interfaces (B104) targeting Str
[test_set] DEBUG added function hardcoded_password_default (B107) targeting FunctionDef
[test_set] DEBUG added function hardcoded_password_funcarg (B106) targeting Call
[test_set] DEBUG added function hardcoded_password_string (B105) targeting Str
[test_set] DEBUG added function hardcoded_sql_expressions (B608) targeting Str
[test_set] DEBUG added function hardcoded_tmp_directory (B108) targeting Str
[test_set] DEBUG added function hashlib_insecure_functions (B324) targeting Call
[test_set] DEBUG added function jinja2_autoescape_false (B701) targeting Call
[test_set] DEBUG added function linux_commands_wildcard_injection (B609) targeting Call
[test_set] DEBUG added function logging_config_insecure_listen (B612) targeting Call
[test_set] DEBUG added function paramiko_calls (B601) targeting Call
[test_set] DEBUG added function request_with_no_cert_validation (B501) targeting Call
[test_set] DEBUG added function request_without_timeout (B113) targeting Call
[test_set] DEBUG added function set_bad_file_permissions (B103) targeting Call
[test_set] DEBUG added function snmp_insecure_version (B508) targeting Call
[test_set] DEBUG added function snmp_weak_cryptography (B509) targeting Call
[test_set] DEBUG added function ssh_no_host_key_verification (B507) targeting Call
[test_set] DEBUG added function ssl_with_bad_defaults (B503) targeting FunctionDef
[test_set] DEBUG added function ssl_with_bad_version (B502) targeting Call
[test_set] DEBUG added function ssl_with_no_version (B504) targeting Call
[test_set] DEBUG added function start_process_with_a_shell (B605) targeting Call
[test_set] DEBUG added function start_process_with_no_shell (B606) targeting Call
[test_set] DEBUG added function start_process_with_partial_path (B607) targeting Call
[test_set] DEBUG added function subprocess_popen_with_shell_equals_true (B602) targeting Call
[test_set] DEBUG added function subprocess_without_shell_equals_true (B603) targeting Call
[test_set] DEBUG added function try_except_continue (B112) targeting ExceptHandler
[test_set] DEBUG added function try_except_pass (B110) targeting ExceptHandler
[test_set] DEBUG added function use_of_mako_templates (B702) targeting Call
[test_set] DEBUG added function weak_cryptographic_key (B505) targeting Call
[test_set] DEBUG added function yaml_load (B506) targeting Call
[test_set] DEBUG added function blacklist (B001) targeting Call
[test_set] DEBUG added function blacklist (B001) targeting Import
[test_set] DEBUG added function blacklist (B001) targeting ImportFrom
[main] INFO running on Python 3.10.2
[manager] DEBUG working on file : poc.py
[manager] ERROR Exception occurred when executing tests against poc.py. Run "bandit --debug poc.py" to see the full traceback.
[manager] DEBUG Exception string: 'utf-8' codec can't decode byte 0xc2 in position 56: invalid continuation byte
[manager] DEBUG Exception traceback: Traceback (most recent call last):
[main] DEBUG Length: 0
[main] DEBUG <bandit.core.metrics.Metrics object at 0x7f40a1ff26e0>
Run started:2022-04-12 17:55:26.123213
Test results:
No issues identified.
Code scanned:
Total lines of code: 0
Total lines skipped (#nosec): 0
Run metrics:
Total issues (by severity):
Undefined: 0
Low: 0
Medium: 0
High: 0
Total issues (by confidence):
Undefined: 0
Low: 0
Medium: 0
High: 0
Files skipped (1):
poc.py (exception while scanning file)
Reproduction steps
1. Copy xxd of file and use xxd -r to decode it into a .py file
poc.txt
00000000: 2320 5255 4e3a 2074 7275 650a 0a23 2048 # RUN: true..# H
00000010: 6572 6520 6973 2061 2073 7472 696e 6720 ere is a string
00000020: 7468 6174 2063 616e 6e6f 7420 6265 2064 that cannot be d
00000030: 6563 6f64 6564 2069 6e20 6c69 6e65 206d ecoded in line m
00000040: 6f64 653a 20c2 2e0a ode: ...
- xxd -r
poc.txt> pocFile.py - Execute
bandit --debug pocFile.py - Crash ...
Expected behavior
Bandit executes as usual and doesn't crash.
Bandit version
1.7.4 (Default)
Python version
3.10 (Default)
Additional context
Bandit 1.7.5, just cloned from main today.
So you will get the same result if you run:
python pocFile.py
However, if a Python file contains UTF-8 characters, then it must be specified in the header:
# -*- coding: utf-8 -*-
That will fix the case using python, but unfortunately Bandit still fails.
@ericwb As of Python 3, utf-8 is the default encoding of source code, and doesn't have to be declared even if the source code contains non-ascii characters. However the example above involves a non-utf-8 encoded character.
@EstevamArantes What you have there is the  character encoded in latin_1 (aka iso-8859-1). This encoding must be declared at the beginning of the file.
https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations
That said, I think this is not a bandit issue and can be closed.
Agree with @mportesdev here. Encoding should be declared in header if not utf-8.