Xin Fu comments

Results 6 comments of


                                            Xin Fu

[WIP] Adding support for V2 GRPC

@yuzisun one thing i just realized is that we didn't put `fastapi` dependency in the `requirements.txt` file. right now the CI could pass because `ray[serve]` uses `fastapi`, I think we...

Deprecate the global `spec.high-availability` field

`high-availability` is used by us to set minReplicas for `HorizontalPodAutoscaler` for knative-serving https://github.com/knative/serving/blob/release-1.6/config/core/deployments/activator-hpa.yaml#L25 https://github.com/knative/operator/blob/8738a21c9c4f5b9d7b27fef0cc9230f264e8e2d0/pkg/reconciler/common/ha.go#L68 and i don’t think we can do the same with just the `deployments.replicas` https://github.com/knative/operator/blob/8738a21c9c4f5b9d7b27fef0cc9230f264e8e2d0/pkg/reconciler/common/deployments_override.go#L89

Explore supporting SSE (Server-Sent Events) streaming for LLM

For reference, OpenAI API stream response looks like this: ``` data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" How"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"...

Xin Fu

[WIP] Adding support for V2 GRPC

Deprecate the global `spec.high-availability` field

Explore supporting SSE (Server-Sent Events) streaming for LLM

Explore supporting SSE (Server-Sent Events) streaming for LLM

Allow metrics to be collected from kpack

Compilation error for 0.8.1 with CUDA 11.2