llmaz icon indicating copy to clipboard operation
llmaz copied to clipboard

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Results 82 llmaz issues
Sort by recently updated
recently updated
newest added

**What would you like to be added**: Different backends support a large range of popular models, but not all models are supported, so we should have a failover policy for...

important-longterm
needs-triage
needs-kind

**What would you like to be added**: Similar to kserve https://kserve.github.io/website/latest/modelserving/v1beta1/custom/custom_model/#parallel-model-inference **Why is this needed**: **Completion requirements**: This enhancement requires the following artifacts: - [x] Design doc - [ ]...

important-longterm
needs-triage
needs-kind

**What would you like to be cleaned**: For example, people want to deploy the model with different scheduling primitives, colocated or exclusive? **Why is this needed**: Expressing deploy primitives.

question
needs-priority
needs-triage

feature
needs-priority
needs-triage
api-change

**What would you like to be added**: Right now, we only have one model version for common deployment, however, if we take a higher level view of the model lifecycle,...

feature
important-longterm
needs-triage

**What would you like to be added**: Kueue is a great project which focus on job queueing and resource management, it can also support inference service by managing Pods, it's...

feature
needs-priority
needs-triage

**What would you like to be added**: This is how it should look like: ``` volumeMounts: - mountPath: /dev/shm name: dshm ``` But this memory size is unknown. **Why is...

feature
needs-priority
needs-triage

This requires the support of lws community.

feature
needs-priority
needs-triage

The first minor release should includes all the stuff in https://github.com/InftyAI/llmaz/milestone/1

needs-priority
needs-triage
needs-kind

**What would you like to be added**: Models can be loaded with different accelerators, for example, llama2-70b can be located with 2 A100 80GB or 4 A100 40 GB, we...

feature
needs-priority
needs-triage