for-linux icon indicating copy to clipboard operation
for-linux copied to clipboard

service fails to start if AWS cloudwatch rate exceeded

Open rogozind opened this issue 1 year ago • 0 comments

Scenario:

  • 100 servers running 30 containers each
  • docker configured to log to AWS CloudWatch
  • all servers rebooted at the same time to apply security patch

This created a very high load on AWS and it started to throttle the API calls. Services started to fail:

ERROR: for rms-core_1 Cannot start service rms-core: failed to create task for container: failed to initialize logging driver: failed to create Cloudwatch log stream: operation error CloudWatch Logs: CreateLogStream, exceeded maximum number of attempts, 3, https response error StatusCode: 400, RequestID: f9a0216c-0e95-437c-bd18-cd231f0ed054, api error ThrottlingException: Rate exceeded

After that the service stayed in the failed state and required manual restart.

Expected: docker keep retrying till it succeeds. Maybe with a fixed delay or maybe with exponential backoff.

rogozind avatar Nov 16 '24 13:11 rogozind