contentctl icon indicating copy to clipboard operation
contentctl copied to clipboard

Adding risk message validation++

Open cmcginley-splunk opened this issue 2 years ago • 1 comments

Context

  • As part of PEX-363, we wanted to expand integration testing to validate risk message content
  • If invalid fields are referenced in the risk message, they will fail to render silently within ES

Code changes

  • Added new RiskEvent model for the events returned by ES
  • Refined search criteria inCorrelationSearch to only search for events which match the appropriate search name
  • Risk validation expanded to check:
    • Risk message contains no $...$ literals
    • Risk message in detection matches risk message in risk event (regex conversion)
    • Risk score matches detection
    • Analytic stories match detection
    • MITRE ATT&CK IDs match detection
  • Added new NotableEvent model for the events returned by ES (will support additional notable validation, potentially added in the future)
  • Removed a bug where we we were sleeping for 60s initially twice (should remove 20h from our cumulative compute time, about 30min on our total runtime across 40 instances)
  • Refined cleanup logic
  • Refactored format_pbar_string s.t. it uses the start_time instance attribute if none is provided explicitly
  • Linted/formatted observable.py

New detection failures

I spot checked these, but did not do a deep dive on every single one; but I believe they are all legitimate validation issues

  • 59 detections are failing this new validation (tested on v4.29.0)
    • 51 detections create risk events where the risk message contains a $...$ literal -> represents a bad field substitution, likely because the referenced field doesn't exist in final SPL output
    • 8 detections have mismatches between the analytic stories listed in the detection and what's observed in the risk event -> spot checking shows this is mostly due to typos/casing in analytic story names in the detection which fails to link the actual analytic story to the risk event in ES

NOTE: this testing was performed locally, and some detections failed due to networking issues, likely due to my ISP bandwidth; so there may be more legitimate failures similar to the above not captured in this initial test

Testing

Will post some results from an SCA pipeline when that run completes

TODO

  • [ ] Disable extra logging in CorrelationSearch
  • [ ] Post results from SCA test pipelines

Future work

  • This PR also adds some commented out code which matches risk events to observables in the detection
  • This feature is still in progress as it does generate some false positives
  • That said, it did expose some legitimate issues (e.g. Attacker role observables creating risk events), so it will be a good addition once that validation logic has been refined to remove false positives.

cmcginley-splunk avatar Jan 04 '24 21:01 cmcginley-splunk

~~Some detections seem to be generating false positives (test failures that are not actual failures); e.g.: Windows Excessive Disabled Services Event~~

Resolved as of 8/7/24

cmcginley-splunk avatar May 08 '24 17:05 cmcginley-splunk