for-linux
for-linux copied to clipboard
service fails to start if AWS cloudwatch rate exceeded
Scenario:
- 100 servers running 30 containers each
- docker configured to log to AWS CloudWatch
- all servers rebooted at the same time to apply security patch
This created a very high load on AWS and it started to throttle the API calls. Services started to fail:
ERROR: for rms-core_1 Cannot start service rms-core: failed to create task for container: failed to initialize logging driver: failed to create Cloudwatch log stream: operation error CloudWatch Logs: CreateLogStream, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: f9a0216c-0e95-437c-bd18-cd231f0ed054, api error ThrottlingException: Rate exceeded
After that the service stayed in the failed state and required manual restart.
Expected: docker keep retrying till it succeeds. Maybe with a fixed delay or maybe with exponential backoff.