feat(controller): support serverless serving with keda support by k8s scale subresource.
What this PR does / why we need it
Detailed Explanation of Commit
This commit introduces a guide for configuring serverless environments on Kubernetes, focusing on integrating Prometheus for monitoring and KEDA for autoscaling. The guide aims to optimize resource efficiency through event-driven scaling while maintaining observability for AI/ML workloads.
- Prometheus Integration: Configured with namespaceSelector for cross-namespace monitoring
- KEDA Autoscaling: Custom metric scaling with Prometheus triggers
- Scale-to-Zero: Activator pattern with request buffering and CloudEvents
Which issue(s) this PR fixes
Fixes #
Special notes for your reviewer
Does this PR introduce a user-facing change?
cc @pacoxu @kerthcet
/kind feature
@pacoxu @googs1025 @carlory @kerthcet Hello all! Could you spare a few minutes to review my PRs when you have a chance?
Other ref PRs:
- https://github.com/InftyAI/llmaz/pull/499
- https://github.com/InftyAI/llmaz/pull/498
/assign I will take a look this week or early next week.
@pacoxu @kenwoodjw Friendly ping, do you have some time to take a look at my PRs? Thanks a lot for your assistance!
/assign
seems some docs are duplicated with https://github.com/InftyAI/llmaz/pull/499/files, can we just put one here and refer to it in another one.
@kerthcet Thank you for catching this! I've refactored the documentation structure to eliminate duplication, Now focuses specifically, and reference link to the main serverless documentation (PR #499)
The test is always failing ...
/retest
@pacoxu I have resolved the conflict. :)
/retest