modelmesh issues

feat: Add kv-store connection check to readiness probe

8

#### Motivation When the modelmesh is not able to connect to the kv store to update its instance recording or sync with the other instances it can not reliably serve...

Legion2

do-not-merge/work-in-progress

Option to configure Inference request timeout

I'm looking for an option to configure request timeouts for inference requests. Either a global or a per request timeout would be nice. Currently we are experiencing many "stuck" inference...

Legion2

Are there any plans to support streaming of prediction responses?

I'm currently trying to setup streaming reponses of LLM generation from vLLM, however I receive an `Streaming not yet supported` error from modelmesh. I think this is coming from this...

Legion2

Add unit test for Payload Processor

It would also be useful to have a unit test for this, but the tests included here don't exercise the actual bug. Ideally we'd have a test that actually runs...

ckadner

test

HTTPS support for RemotePayloadProcessor

1

It should be possible to use `https` for `RemotePayloadProcessor` to communicate to consumers of MM `Payloads`.

tteofili

enhancement

Model per instance model-mesh by default

1

Now model load only on one instance, and lazy loading on another pods, when reauest has come. Can we modify internal modelmesh parameters for default loading model on all ServingRuntime...

fsatka

question

How to run modelmesh without k8s

1

Can you describe steps please, for running modelmesh locally with runtime adapter, etcd and some serving? It needs for local debugging and clarifying some logic of work

fsatka

documentation

question

inference with timeouts leads to Internal error

2

ServingRuntime: `torchserve` ### Current behavior * sent requests with client timeouts (load our modelmesh) * after some time, client starts to receive ``` ERROR: Code: Internal Message: org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions: INTERNAL: Model...

fsatka

bug