llmaz icon indicating copy to clipboard operation
llmaz copied to clipboard

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Results 82 llmaz issues
Sort by recently updated
recently updated
newest added

**What would you like to be added**: Right now we can download model weights from model hub directly, but each time we start/restart a pod, it will downloading the model...

feature
needs-priority
needs-triage

#### What this PR does / why we need it https://github.com/InftyAI/llmaz/issues/163#issue-2526060924 #### Which issue(s) this PR fixes None #### Special notes for your reviewer this pr, it mainly includes things:...

feature
needs-priority
needs-triage

**What would you like to be added**: Take [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/tree/main) for example, it not only contain the chunked model weights, it also has consolidated model weights, when downloading models from huggingface,...

feature
needs-priority
needs-triage

As the `service.Spec` describes, we have `minReplicas` and `maxReplicas`, what we hope to do is adjust the number based on the traffic, aka. servreless. We can use ray or keda/knative...

enhancement
feature
important-longterm

Add the support for inference services.

feature
needs-priority
needs-triage

**What would you like to be added**: Once we launched a model, we can simply co-launch a webui for interaction, we can support llmpos tools like dify but maybe we...

help wanted
feature
needs-priority
needs-triage
needs-kind

We have several integration tests for webhooks, however, they're the very simple ones, we need more, like covering the update cases.

good first issue
cleanup
needs-priority
needs-triage

**What would you like to be added**: Right now, model management is a tricky problem in the cluster, it's big, so we need to cache them in the node just...

feature
needs-priority
needs-triage

I came across the roadmap and am particularly interested in the Gateway API section. Will Llamz support advanced traffic management features, such as shadow and canary deployments between different model...

feature
needs-priority
needs-triage

This is the currently unit test coverage: ``` βˆ… api/core/v1alpha1 (64ms) βˆ… pkg (5ms) βˆ… api/inference/v1alpha1 (43ms) βˆ… pkg/controller (33ms) βˆ… pkg/cert (78ms) βˆ… pkg/controller/inference (28ms) βˆ… pkg/webhook (23ms) βœ“...

good first issue
cleanup
needs-priority
needs-triage