Add new state: Unicode compatibility normalization
Hey ,
I noticed that you are considering only two states:
- One regarding the path normalization if it is done or not before the safe check
- Second concerns the safe check.
as shown next:
https://github.com/github/codeql/blob/c1c0a705b9f14c0f577a9ae56a9d699e8b6e67d6/python/ql/lib/semmle/python/security/dataflow/PathInjectionQuery.qll#L20-L28
However, there is a third state that is a required one: Unicode normalized. If ever a Unicode normalization is performed with a compatibility algorithm (NFKC or NFKD), the query would miss some cases precisely those ones where the Unicode normalization is not performed before the path normalization and the safe check. I draw a little chart to depict my saying:
The previous chart shows that when you consider a potential Unicode compatibility normalization, it is a required step before path normalization and safe check. If ever placed between the first two steps or after the last one, that would yield a vulnerable case that got missed due to the fact that the Unicode normalization may reintroduce unexpected special characters such as .. and /.
Regards @Sim4n6
Hi @Sim4n6 👋🏻
Thank you for opening this issue. Do you have a minimal code example that demonstrates this false negative? I.e. where you'd expect to get an alert, but the query doesn't produce one.
Something like :
import os.path
from flask import Flask, request, abort
app = Flask(__name__)
@app.route("/user_picture3")
def user_picture3():
base_path = '/server/static/images'
filename = request.args.get('p')
fullpath = os.path.normpath(os.path.join(base_path, filename))
if not fullpath.startswith(base_path):
raise Exception("not allowed")
fullpath_unicode = unicodedata.normalize("NFKC", fullpath)
data = open(fullpath_unicode, 'rb').read()
return data
and consider p = "/server/static/images/‥/‥/‥/‥/etc/" where ‥ equals U+2025 for instance.