llmaz
llmaz copied to clipboard
βΈοΈ Easy, advanced inference platform for large language models on Kubernetes. π Star to support our work!
**What would you like to be added**: Right now, we have at most two inferenceModes in backendRuntime, one is Default, another is SpeculativeDecoding, what if people wants to customized there...
**What would you like to be added**: It would be super great to support benchmarking the LLM throughputs or latencies with different backends. **Why is this needed**: Provide proofs for...
**What would you like to be added**: Right now, llmaz is mostly designed for large language models, however, some users may need to support traditional models as a singleton solution,...
**What would you like to be added**: See https://github.com/spotinst as an example, which means we should support multi cloud providers. **Why is this needed**: Cost saving for users. **Completion requirements**:...
**What would you like to be added**: Generally, - if user use object stores, they can use fluid as distributed caching system - if user use oci images, they can...
**What would you like to be added**: Support filesystems with the uri protocol as `pvc://`, this is compatible with distributed cache systems like fluid in the future. **Why is this...
At the first glance, because the Models are published by the admins, it maybe ok because the data source is under supervising. Or is that user need?
**What would you like to be added**: ollama provides [sdk](https://github.com/ollama/ollama-python) for integrations, we can easily integrate with it, one of the benefits I can think of is ollama maintains a...
**What would you like to be added**: For inference scenarios, prompts management is an important part of it. **Why is this needed**: Easy to use for inference users. **Completion requirements**:...
xref: https://github.com/InftyAI/llmaz/issues/20