containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

[ECS] [request]: Faster Target tracking scaling policies scale out

Open jeroenhabets opened this issue 2 years ago • 5 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request Currently ECS will wait 15+ minutes (default Cloudwatch metrics) or 5-6 minutes (detailed Cloudwatch metrics) before triggering a scale out which is way too slow to prevent issues for end users. This should be made much much faster to handle increased load without negatively impacting users.

One way would be to add "evaluation-periods" option to Target tracking scaling policies to configure how many evaluation periods to wait before triggering a scale out. Another more drastic and effective solution would be to let EC2 agents proactively push increased CPU alerts to ECS/Auto scaling (and not wait for CloudWatch).

Ideally, there would be two "evaluation-periods" options one for scale out (= being fast is key) and another for scale in (= balance cost vs stability)

Which service(s) is this request for? ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? As soon as our ECS cluster comes under increased load it should be able to trigger a scale out. Currently, not possible with Target tracking scaling policies and their fixed 5 evaluation periods, and only possible with a 1+ minute delay using step scaling or faster using a custom orchestrator with agents running inside the instances/containers.

With two "evaluation-periods" options, we could set scale out periods to 1 and keep scale in periods to 5 or even 10/20.

Are you currently working around this issue? Yes, using step scaling, though it hurts to see ECS wait 1+ minute after the load has increased.

jeroenhabets avatar Aug 16 '23 12:08 jeroenhabets

I've just started using ECS and I have to say that having the compulsory scaling policy is more bothersome than I anticipated 🥲

lucastonelli avatar Feb 01 '24 03:02 lucastonelli

I don't think its unreasonable to need 3 data points to scale but having them a mandatory 1 minute apart is really slow, given one of the advantages of containers is that they can spin up so quickly and means you cannot respond fast enough to a bursty loads. You almost want a "burst" capability in ECS similar to the burst capabilities we have on some of the EC2 instance types

janeglovergmsl avatar Feb 13 '24 09:02 janeglovergmsl

I have the same problem. Furthermore, I find that ASG is also slow to launch instances. It would be nice if ECS had the ability to scale out faster like Karpenter

ponkio-o avatar Apr 03 '24 00:04 ponkio-o

I would be willing to pay more money to have certain ECS metrics pushed into CloudWatch at an increased frequency (higher than 1 minute).

joekeilty-oub avatar Jul 04 '24 11:07 joekeilty-oub