llmaz issues

Failover policy for various backends

1

**What would you like to be added**: Different backends support a large range of popular models, but not all models are supported, so we should have a failover policy for...

kerthcet

important-longterm

needs-triage

needs-kind

Parallel model serving

1

**What would you like to be added**: Similar to kserve https://kserve.github.io/website/latest/modelserving/v1beta1/custom/custom_model/#parallel-model-inference **Why is this needed**: **Completion requirements**: This enhancement requires the following artifacts: - [x] Design doc - [ ]...

kerthcet

important-longterm

needs-triage

needs-kind

Lack the flexibility to express deploy primitives

4

**What would you like to be cleaned**: For example, people want to deploy the model with different scheduling primitives, colocated or exclusive? **Why is this needed**: Expressing deploy primitives.

kerthcet

question

needs-priority

needs-triage

Lora multiplexing support

6

kerthcet

feature

needs-priority

needs-triage

api-change

Model version management

2

**What would you like to be added**: Right now, we only have one model version for common deployment, however, if we take a higher level view of the model lifecycle,...

kerthcet

feature

important-longterm

needs-triage

Integrate with Kueue for fungibility capacity

1

**What would you like to be added**: Kueue is a great project which focus on job queueing and resource management, it can also support inference service by managing Pods, it's...

kerthcet

feature

needs-priority

needs-triage

Mount /dev/shm for shared memory files

1

**What would you like to be added**: This is how it should look like: ``` volumeMounts: - mountPath: /dev/shm name: dshm ``` But this memory size is unknown. **Why is...

kerthcet

feature

needs-priority

needs-triage

Install lws at llmaz-system namespace

3

This requires the support of lws community.

kerthcet

feature

needs-priority

needs-triage

Milestone v0.1.0

The first minor release should includes all the stuff in https://github.com/InftyAI/llmaz/milestone/1

kerthcet

needs-priority

needs-triage

needs-kind

Support different GPU accelerators for fungibility

1

**What would you like to be added**: Models can be loaded with different accelerators, for example, llama2-70b can be located with 2 A100 80GB or 4 A100 40 GB, we...

kerthcet

feature

needs-priority

needs-triage

llmaz
llmaz copied to clipboard

Metadata

Failover policy for various backends

Parallel model serving

Lack the flexibility to express deploy primitives

Lora multiplexing support

Model version management

Integrate with Kueue for fungibility capacity

Mount /dev/shm for shared memory files

Install lws at llmaz-system namespace

Milestone v0.1.0

Support different GPU accelerators for fungibility

← Metadata

Owner

Metadata

llmaz llmaz copied to clipboard

Metadata

← Metadata

Owner

Metadata

llmaz
llmaz copied to clipboard