llmaz icon indicating copy to clipboard operation
llmaz copied to clipboard

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Results 82 llmaz issues
Sort by recently updated
recently updated
newest added

#### What this PR does / why we need it Support dense deployment for LoRA models #### Which issue(s) this PR fixes xref: https://github.com/InftyAI/llmaz/issues/287 #### Special notes for your reviewer...

approved
do-not-merge/hold
do-not-merge/needs-kind
needs-priority
needs-triage

**What would you like to be added**: We will focus on LLM-specific characteristics to load-balance traffic, like prefix-cache aware, kv-cache aware, lora-aware, load-aware, request-profile aware(summary or chat) and so on....

feature
needs-priority
needs-triage

**What would you like to be added**: Add Token/$ to gateway to measure the cost efficiency, indispensable for benchmarkings. Have no idea whether Envoy AI gateway supports this or not....

feature
needs-priority
needs-triage

**What would you like to be added**: From the ai gateway [example](https://github.com/InftyAI/llmaz/blob/main/docs/examples/envoy-ai-gateway/basic.yaml), we'll create the gateway resources manually, this is just because envoy ai gateway is not that mature right...

feature
needs-priority
needs-triage

**What would you like to be added**: Following the https://github.com/InftyAI/llmaz/blob/main/docs/open-webui.md, we can only serve the models in one namespace, we should extend this to kubernetes namespace mechanism, for example, integrate...

help wanted
feature
needs-priority
needs-triage

**What would you like to be added**: - add a envoy plugin as the first routing plugin, it could be very simple like randomly select the instance - add makefile...

feature
needs-priority
needs-triage
needs-kind

https://github.com/InftyAI/llmaz/blob/14dde9636479a66dbc336080e43284d2fecb9bbe/pkg/defaults.go#L20 My env cannot pull docker hub image properly. I need to customize this image. Can we add a `ENV: IMAGE_PREFIX` or something to make this configurable?

needs-priority
needs-triage
needs-kind

~~This is added to OSSP program, so you need to go to https://summer-ospp.ac.cn/org/prodetail/257c80106?list=org&navpage=org to know the details.~~ **Remove the OSSP tag as no one applied for this task.** (1) Background:...

help wanted
feature
needs-priority
needs-triage
needs-kind

**What would you like to be added**: Here's an example from Triton_RTLLM with lws, https://github.com/triton-inference-server/tutorials/blob/main/Deployment/Kubernetes/EKS_Multinode_Triton_TRTLLM/multinode_helm_chart/chart/templates/deployment.yaml, it needs to set a bunch of parameters dynamically, see ```yaml - python3 - ./server.py...

needs-priority
needs-triage
needs-kind

**What would you like to be added**: - As feature more developed, we will gradually integrate other upstream components to improve llmaz feature. Maybe we should verify the crd? For...

feature
important-longterm
needs-triage
needs-kind