openapi-generator icon indicating copy to clipboard operation
openapi-generator copied to clipboard

[BUG][typescript-fetch] BASE_PATH regex vulnerability in runtime.ts

Open gciavarrini opened this issue 1 year ago • 0 comments

Bug Report Checklist

  • [x] Have you provided a full/minimal spec to reproduce the issue?
  • [x] Have you validated the input using an OpenAPI validator (example)?
  • [x] Have you tested with the latest master to confirm the issue still exists?
  • [x] Have you searched for related issues/PRs?
  • [x] What's the actual output vs expected output?
  • [ ] [Optional] Sponsorship to speed up the bug fix or feature request (example)
Description

The generator creates runtime.ts file with this line

export const BASE_PATH = "http://petstore.swagger.io/v1".replace(/\/+$/, "");

According to SonarCloud: "the regex used here, which is vulnerable to super-linear runtime due to backtracking, cannot lead to denial of service."

openapi-generator version

7.6.0 7.5.0

OpenAPI declaration file content or url

petstore.yaml

Generation Details
openapi-generator-cli generate -g typescript-fetch -i petstore.yaml -o generated

or

 npx --yes @openapitools/[email protected] generate -g typescript-fetch -i petstore.yaml  -o generated

(this downloads openapi-generator 7.5.0)

Steps to reproduce
  1. Generate the code using one of the following commands:
    • generator-cli generate -g typescript-fetch -i petstore.yaml -o generated
    • npx --yes @openapitools/[email protected] generate -g typescript-fetch -i petstore.yaml -o generated
  2. Verify the generated BASE_PATH executing head -n 20 generated/runtime.ts
Related issues/PRs
Suggest a fix

Here the suggestions provided by SonarCloud:

Recommended Secure Coding Practices

To avoid catastrophic backtracking situations, make sure that none of the following conditions apply to your regular expression.

In all of the following cases, catastrophic backtracking can only happen if the problematic part of the regex is followed by a pattern that can fail, causing the backtracking to actually happen.

  • If you have a repetition r* or r*?, such that the regex r could produce different possible matches (of possibly different lengths) on the same input, the worst-case matching time can be exponential. This can be the case if r contains optional parts, alternations, or additional repetitions (but not if the repetition is written in such a way that there’s only one way to match it).
  • If you have multiple repetitions that can match the same contents and are consecutive or are only separated by an optional separator or a separator that can be matched by both of the repetitions, the worst-case matching time can be polynomial (O(n^c) where c is the number of problematic repetitions). For example a*b* is not a problem because a* and b* match different things and a*_a* is not a problem because the repetitions are separated by a '' and can’t match that ''. However, a*a* and .*_.* have quadratic runtime.
  • If the regex is not anchored to the beginning of the string, quadratic runtime is especially hard to avoid because whenever a match fails, the regex engine will try again starting at the next index. This means that any unbounded repetition, if it’s followed by a pattern that can fail, can cause quadratic runtime on some inputs. For example str.split(/\s*,/) will run in quadratic time on strings that consist entirely of spaces (or at least contain large sequences of spaces, not followed by a comma).

In order to rewrite your regular expression without these patterns, consider the following strategies:

  • If applicable, define a maximum number of expected repetitions using the bounded quantifiers, like {1,5} instead of + for instance.
  • Refactor nested quantifiers to limit the number of way the inner group can be matched by the outer quantifier, for instance this nested quantifier situation (ba+)+ doesn’t cause performance issues, indeed, the inner group can be matched only if there exists exactly one b char per repetition of the group.
  • Optimize regular expressions by emulating possessive quantifiers and atomic grouping.
  • Use negated character classes instead of . to exclude separators where applicable. For example the quadratic regex .*_.* can be made linear by changing it to [^_]*_*.

Sometimes it’s not possible to rewrite the regex to be linear while still matching what you want it to match. Especially when the regex is not anchored to the beginning of the string, for which it is quite hard to avoid quadratic runtimes. In those cases consider the following approaches:

  • Solve the problem without regular expressions.
  • Use an alternative non-backtracking regex implementations such as Google’s RE2 or node-re2.
  • Use multiple passes. This could mean pre- and/or post-processing the string manually before/after applying the regular expression to it or using multiple regular expressions. One example of this would be to replace str.split(/\s*,\s*/) with str.split(",") and then trimming the spaces from the strings as a second step.
  • It is often possible to make the regex infallible by making all the parts that could fail optional, which will prevent backtracking. Of course this means that you’ll accept more strings than intended, but this can be handled by using capturing groups to check whether the optional parts were matched or not and then ignoring the match if they weren’t. For example the regex x*y could be replaced with x*(y)? and then the call to str.match(regex) could be replaced with matched = str.match(regex) and matched[1] !== undefined.

gciavarrini avatar May 21 '24 15:05 gciavarrini