actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Port ScheduledOverrides to AutoscalingRunnerSet

Open asafhm opened this issue 1 year ago β€’ 23 comments

Fixes #3313 and #2986 (closed with no resolution)

Porting the great work by @mumoshu of the ScheduledOverrides capability of HorizontalRunnerAutoscaler to AutoscalingRunnerSet

How to use it

It's meant to be used in an almost identical way, with one exception - minReplicas was renamed to minRunners to conform with the current naming choice.

Examples:

  1. Within the AutoscalingRunnerSet CR directly:
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  annotations: # abbreviated
  labels: # abbreviated
  name: example-gha-runner-scale-set
spec:
  githubConfigSecret: "<some_value>"
  githubConfigUrl: "<some_value>"
  maxRunners: 2
  minRunners: 1
  scheduledOverrides:
  - startTime: "2021-05-01T00:00:00+09:00"
    endTime: "2021-05-03T00:00:00+09:00"
    recurrenceRule:
      frequency: Weekly
      untilTime: "2024-07-01T00:00:00+09:00"
    minRunners: 0
  1. With the gha-runner-scale-set helm chart, by setting scheduledOverrides in the values:
minRunners: 1
maxRunners: 2

scheduledOverrides:
  - startTime: "2021-05-01T00:00:00+09:00"
    endTime: "2021-05-03T00:00:00+09:00"
    recurrenceRule:
      frequency: Weekly
      untilTime: "2024-07-01T00:00:00+09:00"
    minRunners: 0

Notable changes

Changes to the AutoscalingRunnerSet Status

  1. Like the previous implementation of scheduled overrides, this PR introduces the ScheduledOverridesSummary field in the AutoscalingRunnerSet Status.

  2. In order to share the desired minimum runners between the AutoscalingRunnerSet and AutoscalingListener resources post scheduled overrides evaluation, an additional field is introduced to the AutoscalingRunnerSet Status - DesiredMinRunners. Whenever the listener's minRunners value is different than DesiredMinRunners, the listener is deleted so it can be recreated.

Changes to requeuing of requests

In order to reevaluate the scheduled overrides, the final return ctrl.Result{}, nil in the autoscalingrunnerset reconciliation function was changed to requeue the request after 1 minute.

asafhm avatar Jun 02 '24 21:06 asafhm

πŸ‘

kahirokunn avatar Jun 04 '24 23:06 kahirokunn

bump

gfrid avatar Jun 05 '24 06:06 gfrid

Unlike the legacy actions-runner-controller, periodic reconciliation of resources is practically disabled in the new controller due to WithEventFilter(predicate.ResourceVersionChangedPredicate{}). To know if a scheduled override should take place, periodic reconciliation must occur for the AutoscalingRunnerSet resource in one way or another. In order to make it work here, I used Result.RequeueAfter, and it indeed worked fine, but I believe that going with a standard sync period could eliminate potential scenarios of missing an override due to a bug in the reconciliation logic.

So a small question for the maintainers - What was the reason behind ignoring the resync period in the gha-runner-scale-set-controller? And if the concern is overloading the controller when many resources exist, won't it make sense to enable it just for the AutoscalingRunnerSet resource? I'd love to know.

asafhm avatar Jun 09 '24 19:06 asafhm

Hi @nikola-jokic I can see you're the most active code owner here. Is there anything else needed to push this forward?

asafhm avatar Jun 26 '24 11:06 asafhm

Bump on this one, really needed feature

XciD avatar Jul 22 '24 19:07 XciD

Hey team, do we have an ETA on this? P+ customer ItaΓΊ is asking for a status.

regmontanhani avatar Jul 24 '24 14:07 regmontanhani

Hi Team, any update on this? We met with the customer today.

tkdwill avatar Jul 31 '24 17:07 tkdwill

It would be nice to have a position on this. If it is going to be merged or not. This way we could define what to do to tackle this issue in our clusters.

caiocsgomes avatar Aug 21 '24 07:08 caiocsgomes

πŸ‘‹πŸ½ Any chance we get any sort of feedback on this PR? This is a very important feature for cost management

pragmaticivan avatar Nov 13 '24 00:11 pragmaticivan

@asafhm Thoughts on also supporting cronOverrides with:

apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  annotations: # abbreviated
  labels: # abbreviated
  name: example-gha-runner-scale-set
spec:
  githubConfigSecret: "<some_value>"
  githubConfigUrl: "<some_value>"
  maxRunners: 2
  minRunners: 1
  cronOverrides:
  -  timezone: America/Chicago
      start: 0 6 * * *        # At 6:00 AM
      end: 0 20 * * *        # At 8:00 PM
      minRunners: 0

My main usecase is to have 0 minRunners over the weekend for example.

pragmaticivan avatar Nov 13 '24 00:11 pragmaticivan

@asafhm Thoughts on also supporting cronOverrides with:

apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  annotations: # abbreviated
  labels: # abbreviated
  name: example-gha-runner-scale-set
spec:
  githubConfigSecret: "<some_value>"
  githubConfigUrl: "<some_value>"
  maxRunners: 2
  minRunners: 1
  cronOverrides:
  -  timezone: America/Chicago
      start: 0 6 * * *        # At 6:00 AM
      end: 0 20 * * *        # At 8:00 PM
      minRunners: 0

My main usecase is to have 0 minRunners over the weekend for example.

@pragmaticivan You could achieve this with scheduledOverrides too, like this:

  # Scale down on weekends (Friday 6pm - Monday 8am America/Chicago time)
  - startTime: "2024-11-22T18:00:00-06:00"
    endTime: "2024-11-25T08:00:00-06:00"
    recurrenceRule:
      frequency: Weekly
    minRunners: 0

The advantage here is that you can be specific about the future starting point of this rule up to the date itself, which is impossible using Cron, since it will start from the next match of its pattern. I suppose there may be use cases that require specificity for some.

asafhm avatar Nov 22 '24 11:11 asafhm

Honestly, it appears that this repository in its entirety is not actively maintained anymore, and for the past two months all commits came from bots, regardless of the large amount of PRs waiting to be looked at. That is unfortunate, given the obvious need here for this feature, and the popularity of self-hosted runners on Kubernetes in itself...

asafhm avatar Nov 22 '24 11:11 asafhm

I know that many people take their holidays in September and October, so I guess there's no way around it, but they don't come back very quickly. 😒

kahirokunn avatar Nov 23 '24 03:11 kahirokunn

I know that many people take their holidays in September and October, so I guess there's no way around it, but they don't come back very quickly. 😒

Sadly this trend began before September, and even before this PR was created :/

asafhm avatar Nov 24 '24 14:11 asafhm

@Link- Trying my luck here since no other maintainer appears active recently... Could I kindly solicit a review? There's enough demand from the community for this PR

asafhm avatar Jan 01 '25 10:01 asafhm

Bump

IdanYaffe avatar Jan 24 '25 20:01 IdanYaffe

I've given up hope this or any other long-living PR by the community will eventually get merged. For those here who still need a workaround, there's an operator named kube-green that seems nice and may help. It doesn't appear to be too complex and doesn't support scenarios of specific dates for scaling down, but if your use case is to scale down your runners on weekends/out-of-office hours, you should be fine. I haven't tried it myself since my situation with ARC changed over the last several weeks, but you can use its custom patches capability to support Custom Resources such as AutoscalingRunnerSet.

asafhm avatar Feb 13 '25 09:02 asafhm

@nikola-jokic Would it be possible to get a review on this please?

ali-kafel avatar Feb 26 '25 21:02 ali-kafel

Hey team! Any reason why this was not merged in the release from today? It would be nice to have at least a position on this, an approval or rejection.

caiocsgomes avatar Mar 25 '25 12:03 caiocsgomes

Bump

kahirokunn avatar May 23 '25 06:05 kahirokunn

Bump

sho-chan-081 avatar Jul 09 '25 04:07 sho-chan-081

Hello @nikola-jokic, can you please review this PR? Thank you.

danielkubat avatar Sep 04 '25 13:09 danielkubat

Delivering this feature would not only safe some costs, but also safe the environment. I wonder how many runners are just idling around during non business hours?

enricojonas avatar Oct 29 '25 14:10 enricojonas