codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Swift: Work around some QHelp rendering issues.

Open geoffw0 opened this issue 1 year ago • 8 comments

A few of the Swift .qhelp files have minor rendering issues now that we label the code snippets as Swift (and have syntax highlighting). The issue is believed to be caused by an external component, so for now lets just work around it by rendering problematic snippets without syntax highlighting.

~(I'm not 100% sure I have the syntax correct for this, so lets wait for the previews to be sure)~ all good now.

geoffw0 avatar Aug 16 '24 10:08 geoffw0

QHelp previews:

swift/ql/src/queries/Security/CWE-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, and potentially allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by Swift uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time complexity does not matter.

Example

Consider the following regular expression:


/^_(__|.)+_$/

Its sub-expression "(__|.)+" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Therefore, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:


/^_(__|[^_])+_$/

References

github-actions[bot] avatar Aug 16 '24 10:08 github-actions[bot]

QHelp previews:

swift/ql/src/queries/Security/CWE-020/IncompleteHostnameRegex.qhelp

errors/warnings:

/home/runner/work/codeql/codeql/swift/ql/src/queries/Security/CWE-020/IncompleteHostnameRegex.qhelp:49:63: found attribute "language", but no attributes allowed here
/home/runner/work/codeql/codeql/swift/ql/src/queries/Security/CWE-020/IncompleteHostnameRegex.qhelp:66:64: found attribute "language", but no attributes allowed here
A fatal error occurred: 1 qhelp files could not be processed.
swift/ql/src/queries/Security/CWE-116/BadTagFilter.qhelp

errors/warnings:

/home/runner/work/codeql/codeql/swift/ql/src/queries/Security/CWE-116/BadTagFilter.qhelp:31:51: found attribute "language", but no attributes allowed here
A fatal error occurred: 1 qhelp files could not be processed.
swift/ql/src/queries/Security/CWE-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, and potentially allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by Swift uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time complexity does not matter.

Example

Consider the following regular expression:


/^_(__|.)+_$/

Its sub-expression "(__|.)+" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Therefore, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:


/^_(__|[^_])+_$/

References

github-actions[bot] avatar Aug 16 '24 12:08 github-actions[bot]

We have two choices:

  • remove syntax highlighting from these examples (first commit, my preference)
  • adjust the examples to avoid the problematic syntax (second commit)

The first approach wasn't properly supported by the qhelp converter, hence the second approach, but an improvement to the qhelp converter is being made so that the first approach will work - hence the revert of the second commit (in the third commit).

geoffw0 avatar Aug 16 '24 12:08 geoffw0

QHelp previews:

swift/ql/src/queries/Security/CWE-020/IncompleteHostnameRegex.qhelp

Incomplete regular expression for hostnames

Sanitizing untrusted URLs is an important technique for preventing attacks such as request forgeries and malicious redirections. Often, this is done by checking that the host of a URL is in a set of allowed hosts.

If a regular expression implements such a check, it is easy to accidentally make the check too permissive by not escaping the . meta-characters appropriately. Even if the check is not used in a security-critical context, the incomplete check may still cause undesirable behaviors when it accidentally succeeds.

Recommendation

Escape all meta-characters appropriately when constructing regular expressions for security checks, and pay special attention to the . meta-character.

Example

The following example code checks that a URL redirection will reach the example.com domain, or one of its subdomains.


func handleUrl(_ urlString: String) {
    // get the 'url=' parameter from the URL
    let components = URLComponents(string: urlString)
    let redirectParam = components?.queryItems?.first(where: { $0.name == "url" })

    // check we trust the host
    let regex = #/^(www|beta).example.com//#  // BAD
    if let match = redirectParam?.value?.firstMatch(of: regex) {
        // ... trust the URL ...
    }
}

The check is, however, easy to bypass because the unescaped . allows for any character before example.com, effectively allowing the redirect to go to an attacker-controlled domain such as wwwXexample.com.

Address this vulnerability by escaping . to \.:


func handleUrl(_ urlString: String) {
    // get the 'url=' parameter from the URL
    let components = URLComponents(string: urlString)
    let redirectParam = components?.queryItems?.first(where: { $0.name == "url" })

    // check we trust the host
    let regex = #/^(www|beta)\.example\.com//#  // GOOD
    if let match = redirectParam?.value?.firstMatch(of: regex) {
        // ... trust the URL ...
    }
}

References

swift/ql/src/queries/Security/CWE-116/BadTagFilter.qhelp

Bad HTML filtering regexp

It is possible to match some single HTML tags using regular expressions (parsing general HTML using regular expressions is impossible). However, if the regular expression is not written well, it might be possible to circumvent it. This can lead to cross-site scripting or other security issues.

Some of these mistakes are caused by browsers having very forgiving HTML parsers, and will often render invalid HTML containing syntax errors. Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.

Recommendation

Use a well-tested sanitization or parser library if at all possible. These libraries are much more likely to handle corner cases correctly than a custom implementation.

Example

The following example attempts to filters out all <script> tags.

let script_tag_regex = /<script[^>]*>.*<\/script>/

var old_html = ""
while (html != old_html) {
  old_html = html
  html.replace(script_tag_regex, with: "")
}

...

The above sanitizer does not filter out all <script> tags. Browsers will not only accept </script> as script end tags, but also tags such as </script foo="bar"> even though it is a parser error. This means that an attack string such as <script>alert(1)</script foo="bar"> will not be filtered by the function, and alert(1) will be executed by a browser if the string is rendered as HTML.

Other corner cases include HTML comments ending with --!>, and HTML tag names containing uppercase characters.

References

swift/ql/src/queries/Security/CWE-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, and potentially allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by Swift uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time complexity does not matter.

Example

Consider the following regular expression:


/^_(__|.)+_$/

Its sub-expression "(__|.)+" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Therefore, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:


/^_(__|[^_])+_$/

References

github-actions[bot] avatar Aug 16 '24 15:08 github-actions[bot]

Previews now LGTM. Ready to merge.

geoffw0 avatar Aug 20 '24 08:08 geoffw0

Is there a plan/issue to revert this PR once the rendering has been fixed?

calumgrant avatar Aug 30 '24 13:08 calumgrant

Is there a plan/issue to revert this PR once the rendering has been fixed?

No - it's done by an external tool, we're not really anticipating a fix, and we're not pushing for one if Swift docs are the only thing that's affected.

geoffw0 avatar Aug 30 '24 18:08 geoffw0

I can create an issue in the backlog if you'd like?

geoffw0 avatar Aug 30 '24 18:08 geoffw0

I'll leave it up to you. Approving.

calumgrant avatar Sep 02 '24 08:09 calumgrant

Thanks!

geoffw0 avatar Sep 02 '24 11:09 geoffw0