kaito icon indicating copy to clipboard operation
kaito copied to clipboard

Support multi-node distributed inference

Open sdesai345 opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

To serve larger language models with billions of parameters, users want to deploy multi-node inference on their Kubernetes cluster using the KAITO out-of-box presets.

Describe the solution you'd like

Multi-node inference on HuggingFace models with limited node pool configuration steps, starting with support for widely-used models with 70B params or less.

sdesai345 avatar Mar 03 '25 18:03 sdesai345

related: #873

MartinForReal avatar Apr 24 '25 07:04 MartinForReal

Marking this as done. We will track https://github.com/kaito-project/kaito/issues/1145 separately.

chewong avatar Jun 19 '25 04:06 chewong