LitServe Feat: multiple endpoints using a list of LitServer

Before submitting

[x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
[x] Did you read the contributor guideline, Pull Request section?
[x] Did you make sure to update the docs?
[x] Did you write any new necessary tests?

⚠️ How does this PR impact the user? ⚠️

As a user, I want to host multiple endpoints for different purposes, such as serving an embedding API, prediction API, etc., on the same server while maintaining LitServer features.

What does this PR do?

Fixes #271.

This PR introduces a feature that allows running multiple LitServer instances in a combined form, as discussed in issue #271.

Usage

# server.py
from litserve.server import LitServer, run_all
from litserve.test_examples import SimpleLitAPI


class SimpleLitAPI1(SimpleLitAPI):
    def setup(self, device):
        self.model = lambda x: x**1


class SimpleLitAPI2(SimpleLitAPI):
    def setup(self, device):
        self.model = lambda x: x**2


class SimpleLitAPI3(SimpleLitAPI):
    def setup(self, device):
        self.model = lambda x: x**3


class SimpleLitAPI4(SimpleLitAPI):
    def setup(self, device):
        self.model = lambda x: x**4


if __name__ == "__main__":
    server1 = LitServer(SimpleLitAPI1(), api_path="/predict-1")
    server2 = LitServer(SimpleLitAPI2(), api_path="/predict-2")
    server3 = LitServer(SimpleLitAPI3(), api_path="/predict-3")
    server4 = LitServer(SimpleLitAPI4(), api_path="/predict-4")
    run_all([server1, server2, server3, server4], port=8000)

# client.py
import requests

for i in range(1, 5):
    resp = requests.post(f"http://127.0.0.1:8000/predict-{i}", json={"input": 4.0}, headers=None)
    assert resp.status_code == 200, f"Expected response to be 200 but got {resp.status_code}"
    assert resp.json() == {"output": 4.0**i}, f"Expected response to be {4.0**i} but got {resp.json()}"

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Sep 12 '24 08:09 bhimrazy

Codecov Report

Attention: Patch coverage is 98.43750% with 1 line in your changes missing coverage. Please review.

Project coverage is 95%. Comparing base (f475369) to head (e5db967).

Additional details and impacted files

@@         Coverage Diff         @@
##           main   #276   +/-   ##
===================================
  Coverage    95%    95%           
===================================
  Files        14     14           
  Lines      1082   1143   +61     
===================================
+ Hits       1025   1085   +60     
- Misses       57     58    +1

Sep 12 '24 08:09 codecov[bot]

@bhimrazy wow, love the api. nice job!

side question, shouldn’t the port also be tied to each server? not in the run_all function?

ie: i want a server on 8000 another on 8001? cc @lantiga

Sep 17 '24 01:09 williamFalcon

@bhimrazy wow, love the api. nice job!

side question, shouldn’t the port also be tied to each server? not in the run_all function?

ie: i want a server on 8000 another on 8001? cc @lantiga

Thanks, @williamFalcon! 🙌

Regarding the question: Yes, if you're referring to hosting each LitServer on separate ports.

In the current case (#276), the routes from each LitServer are combined, and the combined single app is hosted. This way it allows all the routes to be accessed from the same port.

(Maybe, we should consider renaming the run_all function to better reflect this use case.)

However, I think to host each LitServer on individual ports, we could still prefer the default method, but have to run them separately.:

server1 = LitServer(SimpleLitAPI1(), api_path="/predict-1")
server1.run(port=8000)
----
server2 = LitServer(SimpleLitAPI2(), api_path="/predict-2")
server2.run(port=8001)

Hope this clarifies things!

Sep 17 '24 05:09 bhimrazy

still can't use this function

Sep 25 '24 14:09 aceliuchanghong

In the meantime , this gets merged , is there any other way to run multiple endpoints in one main server ?

Oct 02 '24 14:10 VikramxD

this is paused and not scheduled to be merged until we have a very clear usecase.

so, the best way to unblock this is to share code of what you are trying to do and why you wouldn’t just run two separate servers on the same machine?

@VikramxD

Oct 02 '24 15:10 williamFalcon

hi! do you know when this PR will be merged? in my case, multiple machine learning engineers add their embedding or prediction models in one codebase, then this main litserve model is run in K8S

thanks

Jan 09 '25 13:01 raulcarlomagno

hi! do you know when this PR will be merged? in my case, multiple machine learning engineers add their embedding or prediction models in one codebase, then this main litserve model is run in K8S

thanks

Hi @raulcarlomagno,
Not yet, unfortunately.

If this feature is crucial for your use case, you can consider using this snippet from this PR. It should help you achieve the desired functionality for now.

Jan 12 '25 11:01 bhimrazy

thumbs up for this piece of art 👍

Mar 12 '25 10:03 raulcarlomagno

Closing this PR for now. Issue #271 will be addressed soon in a separate PR. There is an ongoing discussion around it, and since this PR is quite old and the implementation might take a different approach, it's better to start fresh with a new PR.

Apr 24 '25 03:04 bhimrazy