remove defusedxml in favor of lxml
I would recommend to either go with lxml or defusedxml, but don't use a mixture of both. https://discuss.python.org/t/status-of-defusedxml-and-recommendation-in-docs/34762/19
I chose lxml as defusedxml got the last update in March 2021
@Maffooch this is a followup of https://github.com/DefectDojo/django-DefectDojo/pull/9812
Hi there :wave:, @dryrunsecurity here, below is a summary of our analysis and findings.
| DryRun Security | Status | Findings |
|---|---|---|
| Server-Side Request Forgery Analyzer | :white_check_mark: | 0 findings |
| Configured Codepaths Analyzer | :white_check_mark: | 0 findings |
| IDOR Analyzer | :white_check_mark: | 0 findings |
| Sensitive Files Analyzer | :white_check_mark: | 0 findings |
| SQL Injection Analyzer | :white_check_mark: | 0 findings |
| Authn/Authz Analyzer | :white_check_mark: | 0 findings |
| Secrets Analyzer | :white_check_mark: | 0 findings |
[!Note] :green_circle: Risk threshold not exceeded.
Change Summary (click to expand)
The following is a summary of changes in this pull request made by me, your security buddy :robot:. Note that this summary is auto-generated and not meant to be a definitive list of security issues but rather a helpful summary from a security perspective.
Summary:
The code changes in this pull request focus on improving the security and robustness of the various XML parsers used in the DefectDojo application security management tool. The key changes include:
- Replacing the
defusedxmllibrary with thelxmllibrary for XML parsing, which provides better security and performance.- Configuring the XML parsers to disable the resolution of external XML entities, effectively mitigating the risk of XML External Entity (XXE) vulnerabilities.
- Enhancing the parsing of vulnerability details, such as mitigation information, severity levels, and deduplication of findings.
- Improving the handling of various report formats, including Nessus, Veracode, Nikto, and others, to ensure accurate and comprehensive security data is imported into the DefectDojo application.
These changes demonstrate a security-focused approach to the application's XML parsing functionality, addressing potential vulnerabilities and improving the overall quality and reliability of the security data processed by the tool.
Files Changed:
dojo/tools/appspider/parser.py: The changes replace thedefusedxmllibrary withlxmland disable XML entity resolution to mitigate XXE vulnerabilities.dojo/tools/burp_dastardly/parser.py: Similar changes to the Burp Dastardly parser, addressing potential XML-related vulnerabilities.dojo/tools/burp/parser.py: The Burp Suite XML parser is updated with the same security-focused changes.dojo/tools/acunetix/parse_acunetix_xml.py: The Acunetix XML parser is updated to uselxmland disable entity resolution.dojo/tools/dependency_check/parser.py: The Dependency Check parser is improved, including better handling of identifiers and suppressed vulnerabilities.dojo/tools/cyclonedx/xml_parser.py: The CycloneDX XML parser is updated with thelxmllibrary and entity resolution disabled.dojo/tools/crashtest_security/parser.py: The Crashtest Security parser is updated with thelxmllibrary and entity resolution disabled.- And similar changes are made to the parsers for Fortify, Checkmarx, Nikto, Qualys, Nexpose, OpenSCAP, Nmap, OpenVAS, Outpost24, Qualys, SpotBugs, SSLyze, sslscan, Tenable, VCG, Wapiti, Xanitizer, and Veracode.
Powered by DryRun Security
FYI: I can do a followup PR to fix the added TRY200 regarding ruff linter, but I did prioritize this PR on migrating to lxml, but not on fixing all linter failures of previous implemented parsers.
I believe we originally started using defusedxml to avoid potential security issues parsing untrusted XML with lxml. bandit now reports many of these calls as vulnerable:
...
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
Severity: Medium Confidence: High
CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
Location: ./dojo/tools/zap/parser.py:28:15
27 def get_findings(self, file, test):
28 tree = etree.parse(file)
29 items = list()
--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
Severity: Medium Confidence: High
CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
Location: ./unittests/tools/test_vcg_parser.py:48:18
47 single_finding = open("unittests/scans/vcg/one_finding.xml")
48 vcgscan = etree.parse(single_finding)
49 finding = self.parser.parse_issue(vcgscan.findall("CodeIssue")[0], Test())
...
The defusedxml README describes these various vulnerabilities in more detail and provides some suggestions for safely using lxml. Additionally, lxml has some suggestions on their site.
tl;dr: I think we should either do the legwork to make lxml safe, or convert the few cases where lxml is already used to defusedxml calls:
Test results:
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
Severity: Medium Confidence: High
CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
Location: ./dojo/tools/api_sonarqube/importer.py:385:18
384 parser = etree.HTMLParser()
385 details = etree.fromstring(vuln_details, parser)
386
--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
Severity: Medium Confidence: High
CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
Location: ./dojo/tools/burp_enterprise/parser.py:23:15
22 parser = etree.HTMLParser()
23 tree = etree.parse(filename, parser)
24 if tree:
--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.parse with its defusedxml equivalent function.
Severity: Medium Confidence: High
CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
Location: ./dojo/tools/sonarqube/parser.py:37:19
36 parser = etree.HTMLParser()
37 tree = etree.parse(filename, parser)
38 if self.mode not in [None, "detailed"]:
--------------------------------------------------
>> Issue: [B320:blacklist] Using lxml.etree.fromstring to parse untrusted XML data is known to be vulnerable to XML attacks. Replace lxml.etree.fromstring with its defusedxml equivalent function.
Severity: Medium Confidence: High
CWE: CWE-20 (https://cwe.mitre.org/data/definitions/20.html)
More Info: https://bandit.readthedocs.io/en/1.7.8/blacklists/blacklist_calls.html#b313-b320-xml-bad-etree
Location: ./dojo/tools/sonarqube/parser.py:69:38
68 parser = etree.HTMLParser()
69 html_desc_as_e_tree = etree.fromstring(issue_detail["htmlDesc"], parser)
70 issue_description = self.get_description(html_desc_as_e_tree)
--------------------------------------------------
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
@cneill, I updated how lxml is used (with resolve_entities=False)
Could you also take a look @Maffooch if lxml is fine to be used like this?
reopening to retrigger failed tests.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
@mtesauro: It would be really nice if we could merge this soonish because of the high number of ruff adjustments (PRs) as I have to rebase so many files here and so frequently.
Thank you for the feedback. I will fix your review before this weekend. It would be nice if we get this on the road in the next Release
Done @cneill. You can rereview
Nice, only one more to go :-)
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
I resolved the merge conflict. Will we get this into release 2.34.0 @Maffooch ?
@manuel-sommer Unfortunately, this PR will not make it in the 2.34.0 release. There is still some discussion happening around this change
Do we have an update here @Maffooch ?
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
Friendly reminder @Maffooch
This pull request has conflicts, please resolve those before we can evaluate the pull request.