llmaz icon indicating copy to clipboard operation
llmaz copied to clipboard

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Results 82 llmaz issues
Sort by recently updated
recently updated
newest added

**What would you like to be added**: After https://github.com/InftyAI/llmaz/pull/316, we have prometheus metrics support for controller, however, we need to expose inference engines' metrics as well for further development, like...

feature
needs-priority
needs-triage

**What would you like to be added**: In serverless scenario, once scaled to 0, and traffic comes, it takes time to recover the service, generally we'll use a standby instance...

important-longterm
needs-triage
needs-kind

**What would you like to be cleaned**: Make sure the end-to-end request is successful, we should setup the env with helm. **Why is this needed**:

cleanup
needs-priority
needs-triage

**What would you like to be cleaned**: - [ ] ai gateway support, we have implemented metrics aggregator implementation, but lack the support of ai gateway - [ ] once...

feature
needs-priority
needs-triage

#### What this PR does / why we need it Add a new config `runai-streamer` in the vLLM BackendRuntime to allow loading model using [Run:ai Model Streamer](https://docs.vllm.ai/en/stable/models/extensions/runai_model_streamer.html) to enhance model...

feature
needs-priority
needs-triage

This is added to OSSP program, so you need to go to https://summer-ospp.ac.cn/org/prodetail/257c80102?list=org&navpage=org to know the details. (1) Background: llmaz is an inference platform open-source project based on large language...

feature
needs-priority
needs-triage
needs-kind

**What would you like to be added**: https://github.com/run-ai/runai-model-streamer is a library helps to load tensors from source to the GPU memory directly, we may integrate with this project to accelerate...

feature
needs-priority
needs-triage
needs-kind

#### What this PR does / why we need it Yesterday, I attempted to deploy llmaz using helm, but encountered some issues. This PR fixed the problems and added a...

cleanup
needs-priority
needs-triage

**What would you like to be added**: Envoy AI gateway supports token-based rate limiting, see https://aigateway.envoyproxy.io/docs/getting-started/installation#configuring-envoy-gateway, make this configurable with helm chart, also provides documentations. **Why is this needed**: Envoy...

help wanted
feature
needs-priority
needs-triage

**What would you like to be added**: Support load/unload loras based on the metrics, just like HPA for pod autoscaling. part of https://github.com/InftyAI/llmaz/issues/27 **Why is this needed**: Support N models...

feature
needs-priority
needs-triage
needs-kind