serve icon indicating copy to clipboard operation
serve copied to clipboard

Automatic loading and unloading of model.

Open abhinav-cashify opened this issue 3 years ago • 5 comments

🚀 The feature

Torchserve automatically loads and unloads the model on the basis of the request. If I have registered 3 models in torchserve. If one of the models does not get any hit in like 1 day, it will automatically unload the model from memory. Once I got the hit for that model, it will be loaded back to memory. (like the one provided by AWS Sagemaker multi-model Endpoint)

Motivation, pitch

Currently, we have to use management API to set no of workers to make inferences on that model. If my model is not going to be used for some time, I have to manually set no of workers to 0, if not, then it's continuously consuming resources, even if it's not in use. I would like to set my all models to 0 initial workers, and whenever I inference on one, it will be loaded with 1 worker.

Alternatives

No response

Additional context

No response

abhinav-cashify avatar Jul 26 '22 18:07 abhinav-cashify

@msaroufim

amit-cashify avatar Jul 27 '22 08:07 amit-cashify

@amit-cashify @abhinav-cashify AWS Sagemaker multi-model Endpoint makes call to TorchServe to unload model based on the memory usage. This elastic loading/unloading feature is provided by Sagemaker hosting service. Customers has to pay for the inference latency spike due to model reloading cost.

On TorchServe roadmap, we are going to address memory usage and elastic parallel processing issues by providing the following features:

  • model sharing (ie. one model copy in memory can be shared by multiple workers)
  • #model workers is elastic according to inference traffic volume.

Note: here,

  • #modell workers = 0 does not mean #model copy = 0.
  • #model copy = 0 only if unload model request is received.

Please let us know if you have any questions.

lxning avatar Jul 27 '22 16:07 lxning

Any update on this?

amit-cashify avatar Nov 02 '22 05:11 amit-cashify

Any update? I have quite same problem with this

otakbeku avatar Mar 27 '23 07:03 otakbeku