Xin Fu

Results 6 comments of Xin Fu

@yuzisun one thing i just realized is that we didn't put `fastapi` dependency in the `requirements.txt` file. right now the CI could pass because `ray[serve]` uses `fastapi`, I think we...

`high-availability` is used by us to set minReplicas for `HorizontalPodAutoscaler` for knative-serving https://github.com/knative/serving/blob/release-1.6/config/core/deployments/activator-hpa.yaml#L25 https://github.com/knative/operator/blob/8738a21c9c4f5b9d7b27fef0cc9230f264e8e2d0/pkg/reconciler/common/ha.go#L68 and i don’t think we can do the same with just the `deployments.replicas` https://github.com/knative/operator/blob/8738a21c9c4f5b9d7b27fef0cc9230f264e8e2d0/pkg/reconciler/common/deployments_override.go#L89

For reference, OpenAI API stream response looks like this: ``` data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"Hello"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" How"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" may"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" I"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" assist"},"index":0,"finish_reason":null}]} data: {"id":"chatcmpl-71zdOYCb0s0P6rNQOxNLacZML4KoE","object":"chat.completion.chunk","created":1680709606,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"...

Related PR: https://github.com/kserve/kserve/pull/3334

Specifically, we would like to see the following metrics: - number of images per namespace - number of builds per namespace - number of builds per image - time spent...

I also ran into the same issue compiling `0.8.1` with CUDA `11.7`. It seems doesn't occur when compiling version `0.8.0` ``` 10 errors detected in the compilation of "csrc/quantization/dequantize.cu". error:...