serving Scaling any scalable RD

I have a CRD that creates replicaset, which is not deployment, however it does implement the /scale endpoint.

I would like to be able to scale it up and down (From zero and to zero) with kNative KPA. I have seen the the KPA can point to any type of resource that implements /scale, however I find it difficult to understand how to make it work. From looking in the code it seems like revision is able to create only deployment. So I wanted to try to create the objects without revisions.

I have created the following objects:

a statefulset for example (Also implements /scale)
Service (example + example-internal) that points to the statefulset pods
KPA (Points to the statefulset)
ingresses.networking.internal.knative.dev that points to the service

When I send http request I get Error getting active endpoint: revision.serving.knative.dev "helloworld-go" not found that comes from the activator.

However I did see this issue https://github.com/knative/serving/issues/1507 that says it possible. Can you help please?

Jan 17 '24 15:01 omer-dayan

Hi is this about hpa or kpa. The linked issue is about HPA.

Mar 05 '24 12:03 skonto

My use case is KPA.

I want to be able to scale a statefulset, which is not kNativeService, using KPA

Mar 10 '24 07:03 omer-dayan

Hi @omer-dayan

When I send http request I get Error getting active endpoint: revision.serving.knative.dev "helloworld-go" not found that comes from the activator.

The error comes from the fact that activator is looking for a revision when proxying the request.

I would like to be able to scale it up and down (From zero and to zero) with kNative KPA. From looking in the code it seems like revision is able to create only deployment. So I wanted to try to create the objects without revisions. However I did see this issue https://github.com/knative/serving/issues/1507 that says it possible.

When a revision is reconciled a deployment and a PA is created. KPA uses its own PA reconciler and some other resources to make scaling decisions. The KPA core logic is designed using ducktypes and in theory it can scale anything that is PodScalable but there are parts in the current autoscaler that depend on revisions. So afaik you cannot avoid that. The PodScalable compliance requires the scale subresource to be supported and as noted above statefulsets do that. In order to implement what you are looking for (assuming you still want request based autoscaling and don't want to use HPA directly or KEDA) probably you need to implement a separate autoscaler and that would require a lot of changes. Another approach would be to extract the kpa logic only and implement your own autoscaler outside of Knative. In general KPA scrapes targets for metric statistics that means that your pods need to expose metric statistics and that is why we inject queue proxy (among other reasons). Btw the ticket (https://github.com/knative/serving/issues/1507) refers to HPA where things are simpler and most stuff are delegated to K8s HPA.

May 15 '24 12:05 skonto

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Aug 14 '24 01:08 github-actions[bot]