Add bulk replay capabilities to replay.py
Today a user cannot point to a folder and ingest all datasets with the tool.
Initial Slack conversation over here
One idea to start the conversation is to split the current replay.yml into two parts ...
-
config.ymlwhich contains the Splunk params (host/user/pass) + default index + update_timestamp -
dataset.ymlwhich would exists in each datasets directory (adding info to existing yml file) and contains name + source + sourcetype + index (if user wants to override default one in config.yml
Propose we standardize the per-directory yml filename to dataset.yml so it can easily be found/recognized.
Calling replay.py could look like this ...
python replay.py -h
-c config.yml Splunk configuration (host/user/pass/index/override timestamp) (required)
-d <directory> Directory to recursively search for dataset.yml to start ingesting (required)
-i <index> Override index in config.yml (optional)
-t Override config.yml and update timestamps (optional)
-s <seconds> Sleep seconds in between directory ingests (allow splunk to catchup indexing) (optional)
Each directory's *.yml currently seems to have the sourctypes but not linked/ordered with filename. Here's an example
author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
direct system calls and API unhooking Return value unclear for test T1003.001-6
Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
LSASS read with pypykatz '
environment: attack_range
dataset:
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-powershell.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-security.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon_creddump.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-system.log
sourcetypes:
- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
- WinEventLog:Microsoft-Windows-PowerShell/Operational
- WinEventLog:System
- WinEventLog:Security
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml
As you can see the 'dataset' files are in a different order than 'sourcetypes'. Propose we bring a formal linkage from the filename to the source/sourcetype (basically moving replay_parameters logic from replay.yml to each directory's dataset.yml file so it can be documented per dataset capture and replayed
author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
direct system calls and API unhooking Return value unclear for test T1003.001-6
Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
LSASS read with pypykatz '
environment: attack_range
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml
replay_parameters:
- name: atomic_red_team/windows-powershell.log
source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
sourcetype: xmlwineventlog
notes: <optional>
- name: windows-sysmon.log
source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
sourcetype: xmlwineventlog
I really dig this proposal, although it will cause us to have to refactor a few aspects of our testing pipeline to read from the new yaml structures. With this approach we can/should also create a spec for the dataset.yml and run CI/CD validation for it on every PR. Similarly to security_content repo here. Let me bring this back to the team and think through it but at the surface looks absolutely doable :smile:. Thank you so much for spending the time to write this up, super useful!