codechecker icon indicating copy to clipboard operation
codechecker copied to clipboard

Support Static Analysis Results Interchange Format (SARIF) format?

Open Xazax-hun opened this issue 7 years ago • 4 comments

CSA just gained the ability to output SARIF: https://github.com/llvm-mirror/clang/commit/962c092aae53360ab4d5b1adac78963694b5963f#diff-e47bf599aad9618f970aa41d0f09bf4f

Do we want to support this for interop with other tools? (Or prefer over plist at some point?) Even if we consume plist it would be great to be able to generate both plist and sarif at the same time so both CodeChecker and other tools can consume the same results.

Xazax-hun avatar Nov 01 '18 16:11 Xazax-hun

Please note that the Sarif output still needs a great amount of maturing, for example, it can't handle checkers loaded from plugins, which is a particularly great weakness for our use case. Sadly, the elegant solution for this is very non-trivial, but is on the way.

But of course, that shouldn't stop us from considering using it in the future.

Szelethus avatar Nov 27 '18 14:11 Szelethus

I do not see any bug id (hash) in the generated report. Maybe it should be added to the output or we should generate the report id in CodeChecker for it.

gyorb avatar Apr 11 '19 07:04 gyorb

The current latest SARIF specification v2.10.

See the list of oasis standards

gyorb avatar Oct 22 '19 11:10 gyorb

Documentation:

  • https://sarifweb.azurewebsites.net/
  • https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0-cs01.html#_Toc16012580

Questions:

  • What is missing from the Clang Static Analyzer SARIF output compared to Plist? I have analyzed the following file with the following commands:

    // main.cpp
    #define SET_PTR_VAR_TO_NULL \
      ptr = 0
    
    void nonFunctionLikeMacroTest() {
      int *ptr;
      SET_PTR_VAR_TO_NULL;
      *ptr = 5; // expected-warning{{Dereference of null pointer}}
    }
    
    void log() {}
    
    int max(int a, int b) { // expected-warning{{Duplicate code detected}}
      log();
      if (a > b)
        return a;
      return b;
    }
    
    int maxClone(int a, int b) { // expected-note{{Similar code here}}
      log();
      if (a > b)
        return a;
      return b;
    }
    
    int main() { return 0; }
    
    # Create plist output.
    clang --analyze -Xclang -analyzer-output=plist-multi-file -o ./main_1.plist -Xclang -analyzer-config -Xclang expand-macros=true -Xclang -analyzer-config -Xclang alpha.clone.CloneChecker:MinimumCloneComplexity=5 -Xclang -analyzer-checker=alpha.clone.CloneChecker -Xclang -analyzer-checker=core.NullDereference ./main.cpp
    
    # Create sarif output.
    clang --analyze -Xclang -analyzer-output=sarif -o ./main_1.sarif -Xclang -analyzer-config -Xclang expand-macros=true -Xclang -analyzer-config -Xclang alpha.clone.CloneChecker:MinimumCloneComplexity=5 -Xclang -analyzer-checker=alpha.clone.CloneChecker -Xclang -analyzer-checker=core.NullDereference ./main.cpp
    
    • issue_hash_content_of_line_in_context (bug hash) is not found in the sarif output. I think we can use one of the following property:
      • fingerprints property: http://docs.oasis-open.org/sarif/sarif/v2.0/csprd01/sarif-v2.0-csprd01.html#_Toc517436070
      • partialFingerprints property: http://docs.oasis-open.org/sarif/sarif/v2.0/csprd01/sarif-v2.0-csprd01.html#_Toc517436071. For example Github uses the partialFingerprints property: https://docs.github.com/en/code-security/secure-coding/sarif-support-for-code-scanning#preventing-duplicate-alerts-using-fingerprints
    • Notes are supported:
      "importance": "unimportant",
      "location": {
        "message": {
          "text": "Similar code here"
        }
        ...
      }
      
    • Sarif output doesn't contain macro_expansions text like plist file does:
        <key>name</key><string>SET_PTR_VAR_TO_NULL</string>
        <key>expansion</key><string>ptr = 0</string>)
      
      vs
      {
        "importance": "unimportant",
        "location": {
          "physicalLocation": {
            "artifactLocation": {
              "index": 0,
              "uri": "file:///home/username/projects/simple/main.cpp"
            },
            "region": {
              "endColumn": 3,
              "startColumn": 3,
              "startLine": 6
            }
          }
        }
      }
      
  • Which analyzer supports sarif as an output?

    • Spotbugs: https://github.com/spotbugs/spotbugs/tree/master/spotbugs/src/main/java/edu/umd/cs/findbugs/sarif
    • Gosec: https://github.com/securego/gosec
    • Eslint (eslint formatter): https://eslint.org/docs/user-guide/formatters/
  • SARIF converters

    • https://github.com/microsoft/sarif-sdk/tree/master/src/Sarif.Converters
    • https://github.com/Microsoft/binskim
    • https://github.com/GrammaTech/pylint-sarif
  • Which are the viewers supporting sarif?

    • https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer
    • https://marketplace.visualstudio.com/items?itemName=WDGIS.MicrosoftSarifViewer
    • https://github.com/microsoft/sarif-web-component
    • https://blogs.grammatech.com/integrating-clang-static-analyzer-with-codesonar-using-sarif
    • https://blogs.grammatech.com/using-sarif-to-extend-analysis-of-sast-tools
    • https://www.shiftleft.io/scan/

Remaining questions:

  • One sarif file for every analysis run with all the results.
  • JSON parsing might get slow with lot of results.
  • This will change the current behavior where we update only a few plist files.

Further notes:

  • Thread flows in code flow object can be used to represent a bug path (thread flow describes a time-ordered sequence of code locations on a single thread of execution). Code flow can have multiple thread flows

Useful links:

  • SARIF format sdk (C#): https://github.com/Microsoft/sarif-sdk
  • Python library: https://github.com/microsoft/sarif-python-om
  • Good introduction: https://github.com/microsoft/sarif-tutorials

csordasmarton avatar Apr 15 '21 10:04 csordasmarton

@csordasmarton did a really good analysis above. I wanted to provide some updates on tools using/supporting SARIF since then:

  • The MSVC static analysis toolset supports the SARIF format: https://devblogs.microsoft.com/cppblog/microsoft-c-code-analysis-supports-sarif-2-1/
    • And we recently added notes support to our tools (also in SARIF): https://devblogs.microsoft.com/cppblog/microsoft-cpp-code-analysis-warnings-with-key-events/
  • The MSVC warnings are also betting on SARIF in the long run: https://devblogs.microsoft.com/cppblog/the-future-of-c-compiler-diagnostics-in-msvc-and-visual-studio/
  • There were some recent proposals in Clang to use SARIF for richer diagnostics emitted by the frontend: https://discourse.llvm.org/t/rfc-improving-clang-s-diagnostics/62584
  • SARIF support is a top requested feature for SonarQube: https://portal.productboard.com/sonarsource/3-sonarqube/tabs/5-under-consideration
  • CodeQL supports Sarif: https://codeql.github.com/docs/codeql-cli/sarif-output/
  • CodeQL's and MSVC Code Analysis' integration with GitHub is based on SARIF: https://devblogs.microsoft.com/cppblog/microsoft-cpp-code-analysis-with-github-actions/

Xazax-hun avatar Sep 23 '22 17:09 Xazax-hun

It looks like GCC's static analyzer will also have SARIF support in GCC 13.

Xazax-hun avatar Oct 26 '22 15:10 Xazax-hun

Yeah Github integration in general is based on sarif. https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/uploading-a-sarif-file-to-github

Trass3r avatar Nov 13 '22 17:11 Trass3r

Any news on this new feature?

damiencarol avatar Feb 13 '23 12:02 damiencarol

we will resume work on this soon, @franco0700

dgutson avatar Feb 13 '23 14:02 dgutson

Apologies for confusion, but it seems to me that #4011 added support for parsing SARIF into the CodeChecker, while i (and i suspect many others) was fully expecting that this issues was tracking the exact opposite feature, exporting CodeChecker results into SARIF format (so e.g. GitHub could visualize them). Am i missing the point?

LebedevRI avatar Oct 10 '23 15:10 LebedevRI

Apologies for confusion, but it seems to me that #4011 added support for parsing SARIF into the CodeChecker, while i (and i suspect many others) was fully expecting that this issues was tracking the exact opposite feature, exporting CodeChecker results into SARIF format (so e.g. GitHub could visualize them). Am i missing the point?

I actually expected parsing an input SARIF file, so CodeChecker can ingest the output of more linters.

dgutson avatar Oct 10 '23 16:10 dgutson

Apologies for confusion, but it seems to me that #4011 added support for parsing SARIF into the CodeChecker, while i (and i suspect many others) was fully expecting that this issues was tracking the exact opposite feature, exporting CodeChecker results into SARIF format (so e.g. GitHub could visualize them). Am i missing the point?

You can convert existing report to sarif using report-converter:

report-converter -t cppcheck out/double-free_5197372318879967385.plist -e sarif -o cppcheck.sarif

It is true though that results stored on the server cannot be exported just yet. Also, I realize that report-converter to manually postprocess results isn't the most convenient :)

edit: It may have been a little overkill to close this issue straight away, we support only a subset of sarif that we immediately needed to support the gcc static analyzer. Considering that we have a new issue already, I'll leave this closed.

Szelethus avatar Oct 11 '23 07:10 Szelethus

You can convert existing report to sarif using report-converter:

This requires using git-tip of codechecker, as it was implemented in this PR and the commit https://github.com/Ericsson/codechecker/commit/90a39054d52f91f1acb50620b03beee6d0829e2c is not yet released.

  • https://github.com/Ericsson/codechecker/pull/4011

I was initially surprised that my report-converter does not recognize -e sarif. So that's the reason.

jiridanek avatar Oct 15 '23 12:10 jiridanek