Rolling update support for k8s 1.22 ReplicaSets
Since Kubernetes version 1.22, the ReplicaSets are not scaled down with the youngest node first.
The issue was already raised in akka-management, there are more details there: https://github.com/akka/akka-management/issues/1130.
The issue is that we don't want Singletons to be moved more than necessary during a rolling update.
Thanks @lomigmegard - this also affects scaling down when pod clusters have their pod counts reduced.
We can't copy the Akka change. If someone has time to produce an equivalent change that would be useful. Singletons will move to the next oldest cluster member if the oldest one is stopped. I wonder if it would also be possible to set the pod deletion cost based on knowledge of whether a Pekko cluster member has singletons deployed on it.
Our docs don't exactly encourage the use of singletons. Users should consider if they can rearchitect their applications to avoid relying on them. https://pekko.apache.org/docs/pekko/current/typed/cluster-singleton.html#introduction
@pjfanning If its okay ill take a go at this? I have a decent amount of k8s experience and if it ends up taking too long I can always reassign it to someone else.
We at the Eclipse Ditto project, making use of Apache Pekko and clustering, solved that with a script - also patching the "pods deletion cost" based on how old the pods are.
Sharing it here so people could get an alternative while this is not yet done in Pekko itself.
The script:
- https://github.com/eclipse-ditto/ditto/blob/master/deployment/helm/ditto/scripts/patch-pods-deletion-cost.sh
Which is invoked regularly by a k8s cron job:
- https://github.com/eclipse-ditto/ditto/blob/master/deployment/helm/ditto/templates/hooks/pod-deletion-cost-cron-job.yaml
And also prior to an upgrade:
- https://github.com/eclipse-ditto/ditto/blob/master/deployment/helm/ditto/templates/hooks/pre-upgrade-job.yaml
Thanks @thjaeckle for sharing that. I'll create a PR over the next few days to add a section to our docs highlighting it.
While I did spend some time in writing a solution for this feature a few months back, the biggest issue was in writing tests to make sure everything works as expected (also doesn't help that at the time I wasn't working at a place that was running k8s in production, also was doing this in my spare time and didn't have capacity to push through).
As a proposed alternative, a full solution was upstreamed into Akka Management 1.3.0 at March 28, 2023. which also includes a massive corpus of test suites (something that we are currently lacking a bit). It might be best to just wait until Akka Management hits 3 years, at which point it will automatically convert to Apache 2.0 License allowing use to backport it freely.
Doing this means that we can also guarantee the behaviour is the same as Akka which can help users who decide to migrate. We would have to wait ~4 and a half months, but given that Christmans/New years will be coming soon to me this looks like the best option
Thanks @mdedetrich. In the mean time, the alternative solution provided by @thjaeckle is quite a good way to achieve the same result.