kaito
kaito copied to clipboard
Support multi-node distributed inference
Is your feature request related to a problem? Please describe.
To serve larger language models with billions of parameters, users want to deploy multi-node inference on their Kubernetes cluster using the KAITO out-of-box presets.
Describe the solution you'd like
Multi-node inference on HuggingFace models with limited node pool configuration steps, starting with support for widely-used models with 70B params or less.
related: #873
Marking this as done. We will track https://github.com/kaito-project/kaito/issues/1145 separately.