Drift Detection features that could help automating remediation of drift
Not to long ago we were working on manually remediating our drift on older cfn stacks. Ones that we felt could have been manipulated via the console. It was very tedious. So I decided to see about automating the process via python. It looked simple enough. After getting in and collecting information I saw some crucial pieces of information were missing or seemed to be incomplete.
-
It would appear that drift is considered anything that deviates from the original cloudformation and parameters, Therefore, changes made by other stacks are considered drift to original stack (in our specific case: ingress and egress rules for security groups in a security stack from what we call an integration stack). This is, in it self, not a huge issue, except there is no differentiation between drift that is stack related and drift that is console related. If there was some way to identify/differentiate the drift(possibly just original resource name if it were from cloudformation), it seems like it would be much easier to automate remediation.
-
Some drift information is incomplete. For example: A security group ingress rule would only report Description, From Port, IpProtocol, ToPort in the drift detection, however when running a Describe on the security group it returned all of the needed information to remediate the ingress. (PrefixListIds, IpRanges, UserGroupPairs). An obvious questions would be why not use Describe to fetch and remediate, the issue becomes when you have multiple ingress/egress rules that use the same ports. For example, if I already had 3 ingress rules for SSH from specific CIDRs and someone manually adds a 4th(from a CIDR/PrefixList/SecurityGroup... doesn't matter) I have no way to know which is drift without running a Describe and then deducing which is missing/present. This is further complicated with the 1st issue I described. Without knowing what came from the console or another stack, it seems you would have to concatenate the multiple cloud formation resources from multiple templates to then try to deduce what is present/missing.
I know this is a bit winded so... --------TL;DR-------
- Can we add event sources and resource name(if it came from cloud formation to ingress/egress rules) on drift detection?
- Can we get complete security group Describe information for ingress and egress rules on drift detection?
These are merely suggestions that I think might help out in the automation of drift remediation. Thank you for your time, Jason Wainwright