Adding risk message validation++

Open cmcginley-splunk opened this issue 2 years ago • 1 comments

Context

As part of PEX-363, we wanted to expand integration testing to validate risk message content
If invalid fields are referenced in the risk message, they will fail to render silently within ES

Code changes

Added new RiskEvent model for the events returned by ES
Refined search criteria inCorrelationSearch to only search for events which match the appropriate search name
Risk validation expanded to check:
- Risk message contains no $...$ literals
- Risk message in detection matches risk message in risk event (regex conversion)
- Risk score matches detection
- Analytic stories match detection
- MITRE ATT&CK IDs match detection
Added new NotableEvent model for the events returned by ES (will support additional notable validation, potentially added in the future)
Removed a bug where we we were sleeping for 60s initially twice (should remove 20h from our cumulative compute time, about 30min on our total runtime across 40 instances)
Refined cleanup logic
Refactored format_pbar_string s.t. it uses the start_time instance attribute if none is provided explicitly
Linted/formatted observable.py

New detection failures

I spot checked these, but did not do a deep dive on every single one; but I believe they are all legitimate validation issues

59 detections are failing this new validation (tested on v4.29.0)
- 51 detections create risk events where the risk message contains a $...$ literal -> represents a bad field substitution, likely because the referenced field doesn't exist in final SPL output
- 8 detections have mismatches between the analytic stories listed in the detection and what's observed in the risk event -> spot checking shows this is mostly due to typos/casing in analytic story names in the detection which fails to link the actual analytic story to the risk event in ES

NOTE: this testing was performed locally, and some detections failed due to networking issues, likely due to my ISP bandwidth; so there may be more legitimate failures similar to the above not captured in this initial test

Testing

Will post some results from an SCA pipeline when that run completes

TODO

[ ] Disable extra logging in CorrelationSearch
[ ] Post results from SCA test pipelines

Future work

This PR also adds some commented out code which matches risk events to observables in the detection
This feature is still in progress as it does generate some false positives
That said, it did expose some legitimate issues (e.g. Attacker role observables creating risk events), so it will be a good addition once that validation logic has been refined to remove false positives.

Jan 04 '24 21:01 cmcginley-splunk

~~Some detections seem to be generating false positives (test failures that are not actual failures); e.g.: Windows Excessive Disabled Services Event~~

Resolved as of 8/7/24

May 08 '24 17:05 cmcginley-splunk