Add TTL for retention policy
Description of the issue
If a customer frequently restarts the agent frequently when having a high number of deployments/high number of log groups configured, DescribeLogGroups (DLG) and PutRetentionPolicy (PRP) can be throttled. The DLG/PRP calls are used to add/update retention policies for log groups at the start of the agent.
Description of changes
A new state file named Amazon_CloudWatch_RetentionPolicyTTL is being added. It will contain the log group and the last timestamp in which the retention policy was checked (and no updates were needed) or when the retention policy was updated. The format per line of the file is loggroupname:timestamp. An example of two log groups:
log1:1234567890
log2:1234567890
RetentionPolicyTTL
- The state file is read on startup and stored into
oldTimestamps. TheIsExpired(group)call will read fromoldTimestampsand used to determine if the retention policy should be checked/updated. - The timestamps are cached into the new struct
RetentionPolicyTTLwhich has the fieldnewTimestamps. a. There is a scenario in which timestamps fromoldTimestampsare persisted, that is when the timestamp is expired. This is so we do not lose timestamps from previous agent runs. As a side effect, this will help clean up timestamps for log groups that are no longer configured by the user. - The state file is saved periodically at a 1 minute interval. It can also be saved by calling
Stop().
Target
- Before a Target is checked/updated, the
IsExpired(group)call is made. If not expired, then persist the read timestamp into the new timestamp cache usingUpdateFromFile(group). If expired, then continue the logic of checking/updating the retention policy - The cache is updated using
Update(group)when the retention policy is valid (checked using DLG) or when the retention policy was updated (updated using PRP).
Logfile Input
- An additional separate change was made to make sure that the new state file does not get cleaned up since it's re-using the state folder.
Translation
- The path to the state folder is now configured in the output CWL configuration section for the agent TOML config.
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
- Unit test
Scenarios
1. Two log groups configured, first run
$ cat /opt/aws/amazon-cloudwatch-agent/logs/state/Amazon_CloudWatch_RetentionPolicyTTL fails because file does not exist
- Log groups are updated, and file is written with content:
log.txt:1747933820935
log2.txt:1747933820943
2. Two log groups configured, restarted within 5 minutes of TTL
- Content remains the same
log.txt:1747933820935
log2.txt:1747933820943
3. Two log groups configured, restarted after 5 minutes of TTL
- Content is updated with the new timestamp
log.txt:1747937120236
log2.txt:1747937120243
4. Two logs to one log group configured, restarted
- Content is updated to only have the one log group configured
log.txt:1747937120236
Requirements
Before commit the code, please do the following steps.
- Run
make fmtandmake fmt-sh - Run
make lint
This PR was marked stale due to lack of activity.