exo icon indicating copy to clipboard operation
exo copied to clipboard

Parallelise Model Loading

Open vovw opened this issue 1 year ago • 5 comments

Add model parallelise preloading capability for improved inference startup time

Adds model preloading functionality to improve initial inference latency by allowing models to be loaded into memory before they're needed.

Key changes:

  • Added --preload-models CLI arg to specify models for preloading
  • Introduced preload_model method in inference engine interface
  • Implemented preloading in MLX engine using existing shard loading
  • Enhanced preemptive download to also preload models after download
  • Added concurrent model preloading support in StandardNode

Primary motivation is reducing cold-start latency by preloading models before they're needed, useful for deployments requiring predictable latency.

Tested with MLX engine and verified working with preemptive downloads. Built on existing shard infrastructure, maintains backward compatibility.

test using

exo --preload-models model1, model2
exo --preload-models llama-3.2-1b,llama-3.1-8b 

prev pr #360

vovw avatar Nov 16 '24 15:11 vovw

@AlexCheema PTAL

lmk if u need more changes.

vovw avatar Nov 16 '24 23:11 vovw

@AlexCheema PTAL

vovw avatar Nov 19 '24 18:11 vovw

@AlexCheema , say do I run the formatter over the whole codebase ?? or just the files I edited ?

vovw avatar Nov 20 '24 19:11 vovw

Please respond to my review @vovw

AlexCheema avatar Nov 28 '24 07:11 AlexCheema

Please respond to my review @vovw

flooded with college work rn will address these tomorrow

vovw avatar Nov 28 '24 14:11 vovw

Thanks so much for your contribution and for taking the time to open this PR.

Since this repository has been fully rewritten and the license has changed, I’m closing all existing open PRs to avoid confusion and to align with the new codebase.

I really appreciate your interest in the project, and you’re very welcome to open a new PR against the updated version if you’d like and we look forward to reviewing it!

Evanev7 avatar Dec 18 '25 14:12 Evanev7