serve icon indicating copy to clipboard operation
serve copied to clipboard

Serve, optimize and scale PyTorch models in production

TorchServe

TorchServe is a flexible and easy to use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

πŸš€ Quick start with TorchServe

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu102

# Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

πŸš€ Quick start with TorchServe (conda)

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu102

# Latest release
conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

# Latest release
docker pull pytorch/torchserve

# Nightly build
docker pull pytorch/torchserve-nightly

Refer to torchserve docker for details.

⚑ Why TorchServe

  • Model Management API: multi model management with optimized worker to model allocation
  • Inference API: REST and gRPC support for batched inference
  • TorchServe Workflows: deploy complex DAGs with multiple interdependent models
  • Default way to serve PyTorch models in
  • Export your model for optimized inference. Torchscript out of the box, ORT, IPEX, TensorRT, FasterTransformer
  • Performance Guide: builtin support to optimize, benchmark and profile PyTorch and TorchServe performance
  • Expressive handlers: An expressive handler architecture that makes it trivial to support inferencing for your usecase with many supported out of the box
  • Metrics API: out of box support for system level metrics with Prometheus exports, custom metrics and PyTorch profiler support

πŸ€” How does TorchServe work

  • Model Server for PyTorch Documentation: Full documentation
  • TorchServe internals: How TorchServe was built
  • Contributing guide: How to contribute to TorchServe

πŸ† Highlighted Examples

  • πŸ€— HuggingFace Transformers
  • Model parallel inference
  • MultiModal models with MMF combining text, audio and video
  • Dual Neural Machine Translation for a complex workflow DAG

For more examples

πŸ€“ Learn More

https://pytorch.org/serve

πŸ«‚ Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guide here.

πŸ“° News

πŸ’– All Contributors

Made with contrib.rocks.

βš–οΈ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to [email protected]. For questions directed at Amazon, please send an email to [email protected]. For all other questions, please open up an issue in this repository here.

TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived