modelmesh
modelmesh copied to clipboard
Distributed Model Serving Framework
#### Motivation When the modelmesh is not able to connect to the kv store to update its instance recording or sync with the other instances it can not reliably serve...
I'm looking for an option to configure request timeouts for inference requests. Either a global or a per request timeout would be nice. Currently we are experiencing many "stuck" inference...
I'm currently trying to setup streaming reponses of LLM generation from vLLM, however I receive an `Streaming not yet supported` error from modelmesh. I think this is coming from this...
It would also be useful to have a unit test for this, but the tests included here don't exercise the actual bug. Ideally we'd have a test that actually runs...
It should be possible to use `https` for `RemotePayloadProcessor` to communicate to consumers of MM `Payloads`.
Now model load only on one instance, and lazy loading on another pods, when reauest has come. Can we modify internal modelmesh parameters for default loading model on all ServingRuntime...
Can you describe steps please, for running modelmesh locally with runtime adapter, etcd and some serving? It needs for local debugging and clarifying some logic of work
ServingRuntime: `torchserve` ### Current behavior * sent requests with client timeouts (load our modelmesh) * after some time, client starts to receive ``` ERROR: Code: Internal Message: org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions: INTERNAL: Model...
I am new with modelmesh but so interested in this project. Could we deploy modelmesh using Docker only without k8s cluster? Thanks