llmaz issues

[Umbrella] inference engine metrics installation

2

**What would you like to be added**: After https://github.com/InftyAI/llmaz/pull/316, we have prometheus metrics support for controller, however, we need to expose inference engines' metrics as well for further development, like...

kerthcet

feature

needs-priority

needs-triage

Introduce inference reserve config for standby instance

11

**What would you like to be added**: In serverless scenario, once scaled to 0, and traffic comes, it takes time to recover the service, generally we'll use a standby instance...

kerthcet

important-longterm

needs-triage

needs-kind

e2e test with ai gateway enabled

**What would you like to be cleaned**: Make sure the end-to-end request is successful, we should setup the env with helm. **Why is this needed**:

kerthcet

cleanup

needs-priority

needs-triage

[Umbrella] Metrics Aggregator Implementation

2

**What would you like to be cleaned**: - [ ] ai gateway support, we have implemented metrics aggregator implementation, but lack the support of ai gateway - [ ] once...

kerthcet

feature

needs-priority

needs-triage

feat: support runai streamer for vllm

5

#### What this PR does / why we need it Add a new config `runai-streamer` in the vLLM BackendRuntime to allow loading model using [Run:ai Model Streamer](https://docs.vllm.ai/en/stable/models/extensions/runai_model_streamer.html) to enhance model...

cr7258

feature

needs-priority

needs-triage

[OSPP] KEDA-based Serverless Elastic Scaling for llmaz

3

This is added to OSSP program, so you need to go to https://summer-ospp.ac.cn/org/prodetail/257c80102?list=org&navpage=org to know the details. (1) Background: llmaz is an inference platform open-source project based on large language...

pacoxu

feature

needs-priority

needs-triage

needs-kind

Support runai model streamer for fast model loading

5

**What would you like to be added**: https://github.com/run-ai/runai-model-streamer is a library helps to load tensors from source to the GPU memory directly, we may integrate with this project to accelerate...

kerthcet

feature

needs-priority

needs-triage

needs-kind

chore: add ci to test deploy with helm

4

#### What this PR does / why we need it Yesterday, I attempted to deploy llmaz using helm, but encountered some issues. This PR fixed the problems and added a...

liangyuanpeng

cleanup

needs-priority

needs-triage

Enable envoy token rate limiting by configuration

6

**What would you like to be added**: Envoy AI gateway supports token-based rate limiting, see https://aigateway.envoyproxy.io/docs/getting-started/installation#configuring-envoy-gateway, make this configurable with helm chart, also provides documentations. **Why is this needed**: Envoy...

kerthcet

help wanted

feature

needs-priority

needs-triage

Lora Autoscaler

3

**What would you like to be added**: Support load/unload loras based on the metrics, just like HPA for pod autoscaling. part of https://github.com/InftyAI/llmaz/issues/27 **Why is this needed**: Support N models...

kerthcet

feature

needs-priority

needs-triage

needs-kind

llmaz
llmaz copied to clipboard

Metadata

[Umbrella] inference engine metrics installation

Introduce inference reserve config for standby instance

e2e test with ai gateway enabled

[Umbrella] Metrics Aggregator Implementation

feat: support runai streamer for vllm

[OSPP] KEDA-based Serverless Elastic Scaling for llmaz

Support runai model streamer for fast model loading

chore: add ci to test deploy with helm

Enable envoy token rate limiting by configuration

Lora Autoscaler

← Metadata

Owner

Metadata

llmaz llmaz copied to clipboard

Metadata

← Metadata

Owner

Metadata

llmaz
llmaz copied to clipboard