django-DefectDojo icon indicating copy to clipboard operation
django-DefectDojo copied to clipboard

remove defusedxml in favor of lxml

Open manuel-sommer opened this issue 1 year ago • 28 comments

I would recommend to either go with lxml or defusedxml, but don't use a mixture of both. https://discuss.python.org/t/status-of-defusedxml-and-recommendation-in-docs/34762/19

I chose lxml as defusedxml got the last update in March 2021

manuel-sommer avatar Mar 28 '24 08:03 manuel-sommer

@Maffooch this is a followup of https://github.com/DefectDojo/django-DefectDojo/pull/9812

manuel-sommer avatar Mar 28 '24 08:03 manuel-sommer

Hi there :wave:, @dryrunsecurity here, below is a summary of our analysis and findings.

DryRun Security Status Findings
Server-Side Request Forgery Analyzer :white_check_mark: 0 findings
Configured Codepaths Analyzer :white_check_mark: 0 findings
IDOR Analyzer :white_check_mark: 0 findings
Sensitive Files Analyzer :white_check_mark: 0 findings
SQL Injection Analyzer :white_check_mark: 0 findings
Authn/Authz Analyzer :white_check_mark: 0 findings
Secrets Analyzer :white_check_mark: 0 findings

[!Note] :green_circle: Risk threshold not exceeded.

Change Summary (click to expand)

The following is a summary of changes in this pull request made by me, your security buddy :robot:. Note that this summary is auto-generated and not meant to be a definitive list of security issues but rather a helpful summary from a security perspective.

Summary:

The code changes in this pull request focus on improving the security and robustness of the various XML parsers used in the DefectDojo application security management tool. The key changes include:

  1. Replacing the defusedxml library with the lxml library for XML parsing, which provides better security and performance.
  2. Configuring the XML parsers to disable the resolution of external XML entities, effectively mitigating the risk of XML External Entity (XXE) vulnerabilities.
  3. Enhancing the parsing of vulnerability details, such as mitigation information, severity levels, and deduplication of findings.
  4. Improving the handling of various report formats, including Nessus, Veracode, Nikto, and others, to ensure accurate and comprehensive security data is imported into the DefectDojo application.

These changes demonstrate a security-focused approach to the application's XML parsing functionality, addressing potential vulnerabilities and improving the overall quality and reliability of the security data processed by the tool.

Files Changed:

  • dojo/tools/appspider/parser.py: The changes replace the defusedxml library with lxml and disable XML entity resolution to mitigate XXE vulnerabilities.
  • dojo/tools/burp_dastardly/parser.py: Similar changes to the Burp Dastardly parser, addressing potential XML-related vulnerabilities.
  • dojo/tools/burp/parser.py: The Burp Suite XML parser is updated with the same security-focused changes.
  • dojo/tools/acunetix/parse_acunetix_xml.py: The Acunetix XML parser is updated to use lxml and disable entity resolution.
  • dojo/tools/dependency_check/parser.py: The Dependency Check parser is improved, including better handling of identifiers and suppressed vulnerabilities.
  • dojo/tools/cyclonedx/xml_parser.py: The CycloneDX XML parser is updated with the lxml library and entity resolution disabled.
  • dojo/tools/crashtest_security/parser.py: The Crashtest Security parser is updated with the lxml library and entity resolution disabled.
  • And similar changes are made to the parsers for Fortify, Checkmarx, Nikto, Qualys, Nexpose, OpenSCAP, Nmap, OpenVAS, Outpost24, Qualys, SpotBugs, SSLyze, sslscan, Tenable, VCG, Wapiti, Xanitizer, and Veracode.

Powered by DryRun Security

dryrunsecurity[bot] avatar Mar 28 '24 08:03 dryrunsecurity[bot]

FYI: I can do a followup PR to fix the added TRY200 regarding ruff linter, but I did prioritize this PR on migrating to lxml, but not on fixing all linter failures of previous implemented parsers.

manuel-sommer avatar Mar 28 '24 08:03 manuel-sommer

I believe we originally started using defusedxml to avoid potential security issues parsing untrusted XML with lxml. bandit now reports many of these calls as vulnerable:

...
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/zap/parser.py:28:15
27	    def get_findings(self, file, test):
28	        tree = etree.parse(file)
29	        items = list()

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./unittests/tools/test_vcg_parser.py:48:18
47	        single_finding = open("unittests/scans/vcg/one_finding.xml")
48	        vcgscan = etree.parse(single_finding)
49	        finding = self.parser.parse_issue(vcgscan.findall("CodeIssue")[0], Test())
...

The defusedxml README describes these various vulnerabilities in more detail and provides some suggestions for safely using lxml. Additionally, lxml has some suggestions on their site.

tl;dr: I think we should either do the legwork to make lxml safe, or convert the few cases where lxml is already used to defusedxml calls:

Test results:
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/api_sonarqube/importer.py:385:18
384	        parser = etree.HTMLParser()
385	        details = etree.fromstring(vuln_details, parser)
386	

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/burp_enterprise/parser.py:23:15
22	        parser = etree.HTMLParser()
23	        tree = etree.parse(filename, parser)
24	        if tree:

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/sonarqube/parser.py:37:19
36	            parser = etree.HTMLParser()
37	            tree = etree.parse(filename, parser)
38	            if self.mode not in [None, "detailed"]:

--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
   Severity: Medium   Confidence: High
   CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
   More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
   Location: ./dojo/tools/sonarqube/parser.py:69:38
68	                parser = etree.HTMLParser()
69	                html_desc_as_e_tree = etree.fromstring(issue_detail["htmlDesc"], parser)
70	                issue_description = self.get_description(html_desc_as_e_tree)

--------------------------------------------------

cneill avatar Mar 28 '24 17:03 cneill

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Apr 06 '24 08:04 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar Apr 20 '24 17:04 github-actions[bot]

@cneill, I updated how lxml is used (with resolve_entities=False)

manuel-sommer avatar Apr 20 '24 19:04 manuel-sommer

Could you also take a look @Maffooch if lxml is fine to be used like this?

manuel-sommer avatar Apr 20 '24 19:04 manuel-sommer

reopening to retrigger failed tests.

manuel-sommer avatar Apr 20 '24 20:04 manuel-sommer

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Apr 24 '24 17:04 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar Apr 24 '24 17:04 github-actions[bot]

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Apr 30 '24 14:04 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar Apr 30 '24 18:04 github-actions[bot]

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Apr 30 '24 20:04 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar May 02 '24 09:05 github-actions[bot]

@mtesauro: It would be really nice if we could merge this soonish because of the high number of ruff adjustments (PRs) as I have to rebase so many files here and so frequently.

manuel-sommer avatar May 02 '24 09:05 manuel-sommer

Thank you for the feedback. I will fix your review before this weekend. It would be nice if we get this on the road in the next Release

manuel-sommer avatar May 03 '24 17:05 manuel-sommer

Done @cneill. You can rereview

manuel-sommer avatar May 03 '24 20:05 manuel-sommer

Nice, only one more to go :-)

manuel-sommer avatar May 03 '24 22:05 manuel-sommer

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar May 06 '24 15:05 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar May 06 '24 16:05 github-actions[bot]

I resolved the merge conflict. Will we get this into release 2.34.0 @Maffooch ?

manuel-sommer avatar May 06 '24 16:05 manuel-sommer

@manuel-sommer Unfortunately, this PR will not make it in the 2.34.0 release. There is still some discussion happening around this change

Maffooch avatar May 06 '24 16:05 Maffooch

Do we have an update here @Maffooch ?

manuel-sommer avatar May 23 '24 10:05 manuel-sommer

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar May 30 '24 16:05 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar May 30 '24 18:05 github-actions[bot]

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Jun 25 '24 17:06 github-actions[bot]

Conflicts have been resolved. A maintainer will review the pull request shortly.

github-actions[bot] avatar Jun 25 '24 20:06 github-actions[bot]

Friendly reminder @Maffooch

manuel-sommer avatar Jul 02 '24 09:07 manuel-sommer

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Jul 03 '24 20:07 github-actions[bot]