llmaz
llmaz copied to clipboard
βΈοΈ Easy, advanced inference platform for large language models on Kubernetes. π Star to support our work!
**What would you like to be added**: Different backends support a large range of popular models, but not all models are supported, so we should have a failover policy for...
**What would you like to be added**: Similar to kserve https://kserve.github.io/website/latest/modelserving/v1beta1/custom/custom_model/#parallel-model-inference **Why is this needed**: **Completion requirements**: This enhancement requires the following artifacts: - [x] Design doc - [ ]...
**What would you like to be cleaned**: For example, people want to deploy the model with different scheduling primitives, colocated or exclusive? **Why is this needed**: Expressing deploy primitives.
**What would you like to be added**: Right now, we only have one model version for common deployment, however, if we take a higher level view of the model lifecycle,...
**What would you like to be added**: Kueue is a great project which focus on job queueing and resource management, it can also support inference service by managing Pods, it's...
**What would you like to be added**: This is how it should look like: ``` volumeMounts: - mountPath: /dev/shm name: dshm ``` But this memory size is unknown. **Why is...
This requires the support of lws community.
The first minor release should includes all the stuff in https://github.com/InftyAI/llmaz/milestone/1
**What would you like to be added**: Models can be loaded with different accelerators, for example, llama2-70b can be located with 2 A100 80GB or 4 A100 40 GB, we...