contentctl
contentctl copied to clipboard
Adding risk message validation++
Context
- As part of PEX-363, we wanted to expand integration testing to validate risk message content
- If invalid fields are referenced in the risk message, they will fail to render silently within ES
Code changes
- Added new
RiskEventmodel for the events returned by ES - Refined search criteria in
CorrelationSearchto only search for events which match the appropriate search name - Risk validation expanded to check:
- Risk message contains no
$...$literals - Risk message in detection matches risk message in risk event (regex conversion)
- Risk score matches detection
- Analytic stories match detection
- MITRE ATT&CK IDs match detection
- Risk message contains no
- Added new
NotableEventmodel for the events returned by ES (will support additional notable validation, potentially added in the future) - Removed a bug where we we were sleeping for 60s initially twice (should remove 20h from our cumulative compute time, about 30min on our total runtime across 40 instances)
- Refined cleanup logic
- Refactored
format_pbar_strings.t. it uses thestart_timeinstance attribute if none is provided explicitly - Linted/formatted
observable.py
New detection failures
I spot checked these, but did not do a deep dive on every single one; but I believe they are all legitimate validation issues
-
59 detections are failing this new validation (tested on v4.29.0)
- 51 detections create risk events where the risk message contains a
$...$literal -> represents a bad field substitution, likely because the referenced field doesn't exist in final SPL output - 8 detections have mismatches between the analytic stories listed in the detection and what's observed in the risk event -> spot checking shows this is mostly due to typos/casing in analytic story names in the detection which fails to link the actual analytic story to the risk event in ES
- 51 detections create risk events where the risk message contains a
NOTE: this testing was performed locally, and some detections failed due to networking issues, likely due to my ISP bandwidth; so there may be more legitimate failures similar to the above not captured in this initial test
Testing
Will post some results from an SCA pipeline when that run completes
TODO
- [ ] Disable extra logging in
CorrelationSearch - [ ] Post results from SCA test pipelines
Future work
- This PR also adds some commented out code which matches risk events to observables in the detection
- This feature is still in progress as it does generate some false positives
- That said, it did expose some legitimate issues (e.g. Attacker role observables creating risk events), so it will be a good addition once that validation logic has been refined to remove false positives.
~~Some detections seem to be generating false positives (test failures that are not actual failures); e.g.: Windows Excessive Disabled Services Event~~
Resolved as of 8/7/24