[Feat] Socket scale using ip-hash algo
This PR introduce the llm-server to scale horizontally by having multiple replicas. Fixes #625
The approach For only socket connections nginx will remember the request ip and redirect all relevant response to the same ip.
Using Nginx
- REST api requests are getting evenly distributed among available servers.
- Socket connections stay persistent for ongoing conversation.
Warning: If you wanna test this locally, you will notice that one server (eg. server1) is taking responsibility for multiple chatbots. This is because Nginx is remembering your ip-address and will redirect all socket request to server1. So even-if you're chatting with 100 chatbots in a same time from same ip-address, it will redirect all the requests to that one server.
Solution: It should work in production because the request ip-addresses will be different for users and multiple server instances will come into picture.