django-DefectDojo icon indicating copy to clipboard operation
django-DefectDojo copied to clipboard

Support for prowler scan

Open cosmel-dojo opened this issue 8 months ago • 6 comments

Prowler Scan Parser for DefectDojo

Description

This PR adds support for importing security scan results from Prowler - a security assessment and compliance tool for AWS, Azure, GCP, and Kubernetes. The parser supports both CSV and JSON output formats from Prowler scans.

Key features implemented:

  • Support for all major cloud platforms (AWS, Azure, GCP, Kubernetes)
  • Handle both CSV and JSON formats with automatic detection
  • Extract critical metadata including severity, resource information, and remediation steps
  • Properly map Prowler severity levels to DefectDojo severity levels
  • Handle both active and informational findings based on status codes

The implementation follows the best practices from the parser guide and mimics the structure of other cloud security scan parsers in DefectDojo.


Test results

Comprehensive test coverage has been implemented in test_prowler_parser.py with:

  • Parsing validation for all supported cloud providers (AWS, Azure, GCP, Kubernetes)
  • Support for both JSON and CSV format detection and handling
  • CSV delimiter detection (semicolon vs comma)
  • Field extraction and mapping
  • Severity and status mapping
  • Verification of remediation data extraction
  • Edge cases like empty files or missing fields

How to test this implementation

To test this implementation, follow these steps:

  1. Set up the testing environment:
# First, make sure the testing environment is running
docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml up -d
  1. Run the parser tests:
./run-unittest.sh --test-case unittests.tools.test_prowler_parser
Screenshot 2025-05-14 at 5 09 57 PM

All tests should complete successfully with no failures, validating the parser's functionality across all supported cloud providers and formats.

Documentation

Added sample scan files for all supported cloud providers and formats in the unittests/scans/prowler/ directory to serve as examples for users. These files demonstrate the expected structure and required fields for each format.

Checklist

  • [x] PR rebased against the latest dev branch
  • [x] Feature submitted against dev branch
  • [x] Code is Python 3.11 compliant
  • [x] Code is flake8/ruff compliant (fixed linting issues)
  • [x] Added unit tests to verify functionality
  • [x] Added sample files demonstrating expected input formats
  • [x] No model changes required (uses existing Finding model)
  • [x] Proper label: Import Scans

cosmel-dojo avatar May 14 '25 19:05 cosmel-dojo

DryRun Security

This pull request contains potential security risks, including example security scan files with placeholders that could expose sensitive information and a file parser vulnerability that might enable a Denial of Service attack by processing large files without proper size validation.

:warning: Information Disclosure via Example Security Scan Data in unittests/scans/prowler/examples/output/example_output_aws.ocsf.json
Vulnerability Information Disclosure via Example Security Scan Data
Description Multiple example security scan output files contain detailed placeholders that, if accidentally populated with real data, could expose sensitive organizational information about cloud infrastructure, security configurations, and potential vulnerabilities.

https://github.com/DefectDojo/django-DefectDojo/blob/46d5d3320240962ae023cd14ab6debe14be9d86f/unittests/scans/prowler/examples/output/example_output_aws.ocsf.json#L1-L625

:warning: Potential File Processing DoS in dojo/tools/prowler/parser.py
Vulnerability Potential File Processing DoS
Description The parser reads entire file contents into memory without size validation, which could enable a Denial of Service attack by uploading extremely large files. This could consume excessive memory or processing resources.

https://github.com/DefectDojo/django-DefectDojo/blob/46d5d3320240962ae023cd14ab6debe14be9d86f/dojo/tools/prowler/parser.py#L1-L414


All finding details can be found in the DryRun Security Dashboard.

dryrunsecurity[bot] avatar May 14 '25 19:05 dryrunsecurity[bot]

Could you explain a bit about test_mode what it does and why it is needed? Is the StringIO test really needed, does it test for something that the other filebases tests do not test? I notice there's already AWS Prowler v3 and v4 parsers. Should these be removed/deprecated/merged into this/one prowler parser?

Hey @valentijnscholten

Thank you for your questions! I've actually made some significant improvements to the parser since my original implementation.

Regarding test_mode: After careful consideration and following best practices from other parsers in DefectDojo (like AnchoreCTLPoliciesParser), I've completely refactored the parser to remove the special test handling logic. The parser now:

  • No longer has a test_mode parameter
  • Processes files consistently regardless of context (test or production)
  • Follows the Single Responsibility Principle by focusing solely on parsing
  • Has cleaner, more maintainable code with fewer conditional branches

This change makes the code simpler, more maintainable, and consistent with other parsers in the codebase.

Regarding the StringIO test: Yes, the StringIO test is still valuable as it specifically validates that the parser can handle in-memory file-like objects, not just disk files. This ensures:

  • The parser works when data comes from memory buffers or network streams
  • It properly handles UTF-8 encoding in these scenarios
  • It can process both CSV and JSON data properly from in-memory sources

While file-based tests verify most functionality, the StringIO test ensures the parser works in all contexts, including when integrated with other components that might pass in-memory data.

Regarding the AWS Prowler parsers: The existing aws_prowler and aws_prowler_v3plus parsers are more specialized for specific versions of AWS Prowler output, while this new prowler parser is a universal parser that handles:

  • Multiple cloud providers (AWS, Azure, GCP, and Kubernetes) in a single parser
  • Both CSV and JSON formats in a consolidated way
  • The latest OCSF JSON format along with traditional formats

Rather than deprecating the existing parsers immediately, it makes sense to:

  • Keep the existing parsers for backward compatibility with scans already in the system
  • Document that new users should use the universal Prowler parser
  • Consider a deprecation timeline or migration path for the older parsers in the future

This approach ensures we don't break existing deployments while moving toward a more consolidated, maintainable codebase for Prowler parsing.

cosmel-dojo avatar May 16 '25 16:05 cosmel-dojo

Question: where did the test files this is using come from?

As I mentioned to @valentijnscholten I am using the official examples output from the Prowler Repository here is the link to the previous conversation

I also added examples to prowler using the official documentation you can see them on this link

cosmel-dojo avatar Jun 06 '25 22:06 cosmel-dojo

Question: where did the test files this is using come from?

As I mentioned to @valentijnscholten I am using the official examples output from the Prowler Repository here is the link to the previous conversation

I also added examples to prowler using the official documentation you can see them on this link

OK, gotcha, thank you, and sorry about that. I think I misunderstood something earlier. Please use those official files instead of the ones currently bundled. Just include them as they are, don't even change the filenames. You can delete the existing ones in this PR and use the ones from the official repo in their place. Thank you!

dogboat avatar Jun 07 '25 00:06 dogboat

DryRun Security

This pull request contains example security scan output files with potential information disclosure risks if real data were accidentally populated, though the current files are marked as examples and do not pose an immediate security threat.

Information Disclosure via Example Data in unittests/scans/prowler/examples/output/example_output_aws.csv
Vulnerability Information Disclosure via Example Data
Description Example output files containing security scan results with placeholders for sensitive information pose a potential risk if accidentally populated with real data and exposed. While these are clearly marked as example files, the structure and detail of the data could provide insights into organizational security configurations if mishandled.

https://github.com/DefectDojo/django-DefectDojo/blob/82b53b8dd56fd425bce8034cb0d65d7f3c4b034d/unittests/scans/prowler/examples/output/example_output_aws.csv#L1-L5

Information Disclosure via Example Data Structure in unittests/scans/prowler/examples/output/example_output_aws.ocsf.json
Vulnerability Information Disclosure via Example Data Structure
Description The example JSON output file contains a structure that, if populated with real data, could reveal sensitive AWS account and resource identifiers. The presence of fields for account UIDs, resource names, contact details, and security findings underscores the potential for information disclosure if such a file were inadvertently exposed.

https://github.com/DefectDojo/django-DefectDojo/blob/82b53b8dd56fd425bce8034cb0d65d7f3c4b034d/unittests/scans/prowler/examples/output/example_output_aws.ocsf.json#L1-L625


All finding details can be found in the DryRun Security Dashboard.

dryrunsecurity[bot] avatar Jun 11 '25 17:06 dryrunsecurity[bot]

@dogboat reply to your previous comment

After careful analysis of the official Prowler examples, I can confirm that metadata.event_cod in the JSON format is equivalent to CHECK_ID in the CSV format. Here's the evidence:

Direct Correlation in Examples:

  • In example_output_aws.csv, the first entry has CHECK_ID = accessanalyzer_enabled
  • In the corresponding example_output_aws.ocsf.json, the first entry has metadata.event_code = accessanalyzer_enabled.
  • This pattern is consistent across all examples and providers

Regarding Provider Inference and CHECK_ID Prefixes:

  • I've updated the prefix patterns to match what's actually used in the official examples
  • Old prefixes like iam_, elb_, ec2_, s3_, k8s_, bc_k8s_, gcp_, and gke_ don't appear in any of the official examples (I added those prefixes to match what I had originally used because I didn't know if I could add the original examples from the Prowler Repository). But I have already added them to the project
  • Instead, I found these patterns in the official examples:
    • AWS: accessanalyzer_*, account_*
    • Azure: aks_*
    • Kubernetes: apiserver_*

Evidence from Official Examples:

$ cat example_output_aws.csv | cut -d';' -f11 | sort | uniq
accessanalyzer_enabled
account_maintain_current_contact_details
account_maintain_different_contact_details_to_security_billing_and_operations
account_security_contact_information_is_registered
CHECK_ID

$ grep -o '"event_code": "[^"]*"' example_output_aws.ocsf.json | sort | uniq
"event_code": "accessanalyzer_enabled"
"event_code": "account_maintain_current_contact_details"
"event_code": "account_maintain_different_contact_details_to_security_billing_and_operations"
"event_code": "account_security_contact_information_is_registered"
"event_code": "account_security_questions_are_registered_in_the_aws_account"

Changes Made:

  • Removed legacy prefixes that don't appear in official examples
  • Added/kept only prefixes found in official Prowler output
  • This ensures our parser correctly handles files from the current version of Prowler

cosmel-dojo avatar Jun 11 '25 17:06 cosmel-dojo

This one is probably a good candidate for a rewrite..

Maffooch avatar Oct 21 '25 19:10 Maffooch