llmaz feat(controller): support serverless serving with keda support by k8s scale subresource.

What this PR does / why we need it

Detailed Explanation of Commit

This commit introduces a guide for configuring serverless environments on Kubernetes, focusing on integrating Prometheus for monitoring and KEDA for autoscaling. The guide aims to optimize resource efficiency through event-driven scaling while maintaining observability for AI/ML workloads.

Prometheus Integration: Configured with namespaceSelector for cross-namespace monitoring
KEDA Autoscaling: Custom metric scaling with Prometheus triggers
Scale-to-Zero: Activator pattern with request buffering and CloudEvents

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer

Does this PR introduce a user-facing change?

cc @pacoxu @kerthcet

Sep 28 '25 11:09 X1aoZEOuO

/kind feature

Sep 28 '25 12:09 X1aoZEOuO

@pacoxu @googs1025 @carlory @kerthcet Hello all! Could you spare a few minutes to review my PRs when you have a chance?

Other ref PRs:

https://github.com/InftyAI/llmaz/pull/499
https://github.com/InftyAI/llmaz/pull/498

Sep 29 '25 13:09 X1aoZEOuO

/assign I will take a look this week or early next week.

Oct 09 '25 05:10 pacoxu

@pacoxu @kenwoodjw Friendly ping, do you have some time to take a look at my PRs? Thanks a lot for your assistance!

Oct 15 '25 10:10 X1aoZEOuO

/assign

Oct 27 '25 09:10 kerthcet

seems some docs are duplicated with https://github.com/InftyAI/llmaz/pull/499/files, can we just put one here and refer to it in another one.

@kerthcet Thank you for catching this! I've refactored the documentation structure to eliminate duplication, Now focuses specifically, and reference link to the main serverless documentation (PR #499)

Oct 29 '25 17:10 X1aoZEOuO

The test is always failing ...

Oct 30 '25 18:10 kerthcet

/retest

Oct 31 '25 03:10 pacoxu

@pacoxu I have resolved the conflict. :)

Oct 31 '25 04:10 X1aoZEOuO

/retest

Oct 31 '25 04:10 X1aoZEOuO