llmaz issues

Loading model weights more efficiently

9

**What would you like to be added**: Right now we can download model weights from model hub directly, but each time we start/restart a pod, it will downloading the model...

kerthcet

feature

needs-priority

needs-triage

feat:update model loader

3

#### What this PR does / why we need it https://github.com/InftyAI/llmaz/issues/163#issue-2526060924 #### Which issue(s) this PR fixes None #### Special notes for your reviewer this pr, it mainly includes things：...

qinguoyi

feature

needs-priority

needs-triage

[ModelLoader] Some huggingface models may contain duplicated weights

5

**What would you like to be added**: Take [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/tree/main) for example, it not only contain the chunked model weights, it also has consolidated model weights, when downloading models from huggingface,...

kerthcet

feature

needs-priority

needs-triage

Support autoscaling

9

As the `service.Spec` describes, we have `minReplicas` and `maxReplicas`, what we hope to do is adjust the number based on the traffic, aka. servreless. We can use ray or keda/knative...

kerthcet

enhancement

feature

important-longterm

Liveness & Readiness support

3

Add the support for inference services.

kerthcet

feature

needs-priority

needs-triage

[WebUI] Add support for webui

2

**What would you like to be added**: Once we launched a model, we can simply co-launch a webui for interaction, we can support llmpos tools like dify but maybe we...

kerthcet

help wanted

feature

needs-priority

needs-triage

needs-kind

Add more testcases for webhooks

5

We have several integration tests for webhooks, however, they're the very simple ones, we need more, like covering the update cases.

kerthcet

good first issue

cleanup

needs-priority

needs-triage

Model aware scheduling

3

**What would you like to be added**: Right now, model management is a tricky problem in the cluster, it's big, so we need to cache them in the node just...

kerthcet

feature

needs-priority

needs-triage

Is there any early proposal or document about integrating with Gateway API ?

2

I came across the roadmap and am particularly interested in the Gateway API section. Will Llamz support advanced traffic management features, such as shadow and canary deployments between different model...

caozhuozi

feature

needs-priority

needs-triage

[Umbrella] Improve test coverages

8

This is the currently unit test coverage: ``` ∅ api/core/v1alpha1 (64ms) ∅ pkg (5ms) ∅ api/inference/v1alpha1 (43ms) ∅ pkg/controller (33ms) ∅ pkg/cert (78ms) ∅ pkg/controller/inference (28ms) ∅ pkg/webhook (23ms) ✓...

kerthcet

good first issue

cleanup

needs-priority

needs-triage

llmaz
llmaz copied to clipboard

Metadata

Loading model weights more efficiently

feat:update model loader

[ModelLoader] Some huggingface models may contain duplicated weights

Support autoscaling

Liveness & Readiness support

[WebUI] Add support for webui

Add more testcases for webhooks

Model aware scheduling

Is there any early proposal or document about integrating with Gateway API ?

[Umbrella] Improve test coverages

← Metadata

Owner

Metadata

llmaz llmaz copied to clipboard

Metadata

← Metadata

Owner

Metadata

llmaz
llmaz copied to clipboard