codeql-action Analysis upload fails with `rejecting SARIF, as there are more threadflow steps per result than allowed (19350

Hello!

Today I wanted to enhance our CodeQL scan in the systemd repo by using the security-extended and security-and-quality query sets, but after adding them the CodeQL action can no longer upload the resulting SARIF file:

Waiting for processing to finish
  Analysis upload status is pending.
  Analysis upload status is failed.
  Error: Code Scanning could not process the submitted SARIF file:
  rejecting SARIF, as there are more threadflow steps per result than allowed (19350 > 10000)
  Error: Code Scanning could not process the submitted SARIF file:
  rejecting SARIF, as there are more threadflow steps per result than allowed (19350 > 10000)
      at Object.waitForProcessing (/home/runner/work/_actions/github/codeql-action/0c670bbf0414f39666df6ce8e718ec5662c21e03/lib/upload-lib.js:334:19)
      at async run (/home/runner/work/_actions/github/codeql-action/0c670bbf0414f39666df6ce8e718ec5662c21e03/lib/analyze-action.js:131:13)
      at async runWrapper (/home/runner/work/_actions/github/codeql-action/0c670bbf0414f39666df6ce8e718ec5662c21e03/lib/analyze-action.js:221:9)

Example job: https://github.com/systemd/systemd/actions/runs/3053021449/jobs/4923112318

Configuration:

---
# vi: ts=2 sw=2 et:
# SPDX-License-Identifier: LGPL-2.1-or-later
#
name: "CodeQL"

on:
  pull_request:
    branches:
      - main
      - v[0-9]+-stable
    paths:
      - '**/meson.build'
      - '.github/**/codeql*'
      - 'src/**'
      - 'test/**'
      - 'tools/**'
  push:
    branches:
      - main
      - v[0-9]+-stable

permissions:
  contents: read

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-22.04
    concurrency:
      group: ${{ github.workflow }}-${{ matrix.language }}-${{ github.ref }}
      cancel-in-progress: true
    permissions:
      actions: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: ['cpp', 'python']

    steps:
    - name: Checkout repository
      uses: actions/checkout@2541b1294d2704b0964813337f33b291d3f8596b

    - name: Initialize CodeQL
      uses: github/codeql-action/init@0c670bbf0414f39666df6ce8e718ec5662c21e03
      with:
        languages: ${{ matrix.language }}
        queries: +security-extended,security-and-quality

    - run: sudo -E .github/workflows/unit_tests.sh SETUP

    - name: Autobuild
      uses: github/codeql-action/autobuild@0c670bbf0414f39666df6ce8e718ec5662c21e03

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@0c670bbf0414f39666df6ce8e718ec5662c21e03

Since the build & analysis finishes successfully and only the last step fails, I don't think this is an issue on our side - is there something which can be done to mitigate this or we're out of luck?

Thank you!

Sep 14 '22 13:09 mrc0mmand

Hello! Thanks for reporting this. The first thing to do would be to figure out which rules/alerts are causing the problem. As an immediate fix you could remove the problematic alerts using https://github.com/advanced-security/filter-sarif .

Could you Re-run all jobs on the workflow and toggle the Enable debug logging checkbox? When the workflow finishes a debug artefact should have been uploaded. This artefact contains a .sarif file. The debug artefact is a relatively new feature, so you may need to adjust the SHA of the codeql-action steps in the workflow. Alternatively, you can set the output property of the codeql-action/analyze step to a folder name and use the actions/upload action to upload that folder as an artefact.

Once you obtained the SARIF file, could you run the following jq commands on it and post the output?

cat FILE.sarif | jq  '.runs[].results[].ruleId' | sort | uniq -c

cat FILE.sarif | jq -rc '.runs[].results[] | {rule: .ruleId, file: .locations[0].physicalLocation.artifactLocation.uri, codeFlowsCount: (.codeFlows | length), threadFlowStepsCount: (.codeFlows | reduce .[]? as $item (0;  . + ($item.threadFlows[0].locations | length)))}' | jq -s -c 'sort_by(.threadFlowStepsCount) | .[]'

Sep 14 '22 14:09 aibaars

Hello! Thanks for reporting this. The first thing to do would be to figure out which rules/alerts are causing the problem. As an immediate fix you could remove the problematic alerts using https://github.com/advanced-security/filter-sarif .

Could you Re-run all jobs on the workflow and toggle the Enable debug logging checkbox? When the workflow finishes a debug artefact should have been uploaded. This artefact contains a .sarif file. The debug artefact is a relatively new feature, so you may need to adjust the SHA of the codeql-action steps in the workflow.

You were right, I had to use the main ref instead of the pinned SHAs to get the artifacts, so thanks for that!

Once you obtained the SARIF file, could you run the following jq commands on it and post the output?
cat FILE.sarif | jq  '.runs[].results[].ruleId' | sort | uniq -c

$ cat cpp.sarif | jq  '.runs[].results[].ruleId' | sort | uniq -c
      1 "cpp/cleartext-storage-file"
     16 "cpp/commented-out-code"
      2 "cpp/complex-condition"
      2 "cpp/constant-comparison"
     36 "cpp/fixme-comment"
      4 "cpp/include-non-header"
      3 "cpp/inconsistent-null-check"
     70 "cpp/irregular-enum-init"
      1 "cpp/leap-year/unchecked-after-arithmetic-year-modification"
     92 "cpp/long-switch"
     33 "cpp/loop-variable-changed"
      1 "cpp/missing-case-in-switch"
      7 "cpp/missing-header-guard"
      1 "cpp/non-constant-format"
      1 "cpp/offset-use-before-range-check"
      1 "cpp/overrunning-write"
    137 "cpp/path-injection"
    293 "cpp/poorly-documented-function"
     25 "cpp/stack-address-escape"
      4 "cpp/suspicious-allocation-size"
      2 "cpp/system-data-exposure"
     19 "cpp/toctou-race-condition"
     43 "cpp/too-many-format-arguments"
      9 "cpp/trivial-switch"
    113 "cpp/unbounded-write"
      4 "cpp/uncontrolled-allocation-size"
     15 "cpp/uncontrolled-process-operation"
     81 "cpp/uninitialized-local"
     82 "cpp/unterminated-variadic-call"
      1 "cpp/unused-local-variable"
    183 "cpp/unused-static-function"
      3 "cpp/unused-static-variable"
     34 "cpp/world-writable-file-creation"

cat FILE.sarif | jq -rc '.runs[].results[] | {rule: .ruleId, file: .locations[0].physicalLocation.artifactLocation.uri, codeFlowsCount: (.codeFlows | length), threadFlowStepsCount: (.codeFlows | reduce .[]? as $item (0;  . + ($item.threadFlows[0].locations | length)))}' | jq -s -c 'sort_by(.threadFlowStepsCount) | .[]'

https://gist.github.com/mrc0mmand/d516a60a02c2f15b9bbfeac9b589f8bd

In case a further inspection is needed, I also uploaded the SARIF file to https://mrc0mmand.fedorapeople.org/gh_2022-09-14-cpp.sarif

Sep 14 '22 16:09 mrc0mmand

Thanks for all the details!

The tail of the second jq command shows there are 3 alerts that exceed the limits:

{"rule":"cpp/unbounded-write","file":"src/libsystemd/sd-bus/sd-bus.c","codeFlowsCount":1208,"threadFlowStepsCount":14496}
{"rule":"cpp/path-injection","file":"src/test/test-fs-util.c","codeFlowsCount":606,"threadFlowStepsCount":15135}
{"rule":"cpp/unbounded-write","file":"src/basic/unit-name.c","codeFlowsCount":1211,"threadFlowStepsCount":19350}

You could try to add a advanced-security/filter-sarif to filter out the cpp/unbounded-write results for src/libsystemd/sd-bus/sd-bus.c and src/basic/unit-name.c and also the cpp/path-injection results for src/test/test-fs-util.c.

Alternatively, you could use a query-filter in a CodeQL config file to drop the two queries that are causing problems.

I'll let the team know about this too, so they can investigate the issue and perhaps improve the queries to avoid them producing that much data.

Sep 14 '22 16:09 aibaars

There is a new feature in code-scanning that allows you to remove queries from the analysis. Using this feature should remove the queries that are causing the problems. The documentation for this feature has not yet been published, but I will explain how to use it here.

Create a custom configuration file, as explained here: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-a-custom-configuration-file
The file can be empty, except for a query-filters block, like this:

query-filters:
    - exclude:
        id: 
          - cpp/unbounded-write
          - cpp/path-injection

That should be enough.

Sep 14 '22 17:09 aeisenberg

Thank you both for the suggestions. I tried filtering the SARIF file using advanced-security/filter-sarif and it seems to be doing the right thing*.

Using a CodeQL config file to disable the checks completely would also be an option (and would be probably much easier, given we already utilize such config file), but I'd like to avoid that, since both checks seem quite useful (especially in systemd's case).

On a related note - is this issue caused by some limitations of the SARIF format or it's limited somewhere in the CodeQL action?

* updated config for reference

---
# vi: ts=2 sw=2 et:
# SPDX-License-Identifier: LGPL-2.1-or-later
#
name: "CodeQL"

on:
  pull_request:
    branches:
      - main
      - v[0-9]+-stable
    paths:
      - '**/meson.build'
      - '.github/**/codeql*'
      - 'src/**'
      - 'test/**'
      - 'tools/**'
  push:
    branches:
      - main
      - v[0-9]+-stable

permissions:
  contents: read

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-22.04
    concurrency:
      group: ${{ github.workflow }}-${{ matrix.language }}-${{ github.ref }}
      cancel-in-progress: true
    permissions:
      actions: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: ['cpp', 'python']

    steps:
    - name: Checkout repository
      uses: actions/checkout@2541b1294d2704b0964813337f33b291d3f8596b

    - name: Initialize CodeQL
      uses: github/codeql-action/init@0c670bbf0414f39666df6ce8e718ec5662c21e03
      with:
        languages: ${{ matrix.language }}
        config-file: ./.github/codeql-config.yml
        queries: +security-extended,security-and-quality

    - run: sudo -E .github/workflows/unit_tests.sh SETUP

    - name: Autobuild
      uses: github/codeql-action/autobuild@0c670bbf0414f39666df6ce8e718ec5662c21e03

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@0c670bbf0414f39666df6ce8e718ec5662c21e03
      with:
        upload: False
        output: sarif-results

    - name: Filter SARIF results
      uses: advanced-security/filter-sarif@54b1ee6ebe059d29692bcc246e3c397e99176c6b
      if: ${{ matrix.language == 'cpp' }}
      with:
        patterns: |
          -src/basic/unit-name.c:cpp/unbounded-write
          -src/libsystemd/sd-bus/sd-bus.c:cpp/unbounded-write
          -src/test/test-fs-util.c:cpp/path-injection
        input: sarif-results/cpp.sarif
        output: sarif-results/cpp.sarif

    - name: Upload SARIF results
      uses: github/codeql-action/upload-sarif@0c670bbf0414f39666df6ce8e718ec5662c21e03
      with:
        sarif_file: sarif-results/${{ matrix.language }}.sarif

Sep 14 '22 17:09 mrc0mmand

The issue is that the two queries are behaving badly and producing a large number of paths for an alert. Our team is still diagnosing why. This is producing a sarif file with a large number of threadflow steps. Our code-scanning processor has trouble with these kinds of files and it rejects them.

Sep 14 '22 18:09 aeisenberg

The issue is that the two queries are behaving badly and producing a large number of paths for an alert. Our team is still diagnosing why. This is producing a sarif file with a large number of threadflow steps. Our code-scanning processor has trouble with these kinds of files and it rejects them.

Ah, I see, thank you for the explanation. I also just noticed how many alerts the cpp/unbounded-write check generates (and by a quick look they don't seem quite useful for us anyway), so I might end up going with your suggestion and disabling the check completely . For some reason I thought we had this one enabled on LGTM as well, but it turns out that wasn't the case.

Sep 14 '22 18:09 mrc0mmand

Ah, I see, thank you for the explanation. I also just noticed how many alerts the cpp/unbounded-write check generates (and by a quick look they don't seem quite useful for us anyway), so I might end up going with your suggestion and disabling the check completely . For some reason I thought we had this one enabled on LGTM as well, but it turns out that wasn't the case.

If you have time, could you create an issue on github/codeql with examples of alerts you don't find useful? Perhaps the team can improve the query to improve the quality of the result.

Sep 15 '22 08:09 aibaars

Ah, I see, thank you for the explanation. I also just noticed how many alerts the cpp/unbounded-write check generates (and by a quick look they don't seem quite useful for us anyway), so I might end up going with your suggestion and disabling the check completely . For some reason I thought we had this one enabled on LGTM as well, but it turns out that wasn't the case.

If you have time, could you create an issue on github/codeql with examples of alerts you don't find useful? Perhaps the team can improve the query to improve the quality of the result.

That's definitely a good idea. After enabling the security-enhanced and security-and-quality query sets, they spawned around ~1400 new alerts where most of them were false positives - mostly a couple of the same alerts spread through the entire codebase. Once I have a bit of spare time I'll go through them, prepare some minimal examples, and report them to github/codeql.

Sep 15 '22 09:09 mrc0mmand

Analysis upload fails with `rejecting SARIF, as there are more threadflow steps per result than allowed (19350 > 10000)`