Varun Gupta

Results 87 comments of Varun Gupta

For stream as well request trace is logged. In stream scenario, token token is reported in second last stream. When HandleResponseBody gets the stream with total tokens set, it will...

@zhangjyr Can you check the response, and if no actions is required then close the issue.

Right now request trace is added on EndOfStream, but for streaming, it needs to be added for n-1 stream chunk. cc https://github.com/vllm-project/aibrix/issues/790 - we can add support once we have...

- I am assuming you are using 0.2.1 release, and it does not have /v1/model implementation. It will be part of 0.3.0 (planned for this week). - For rate limiting,...

To update envoy-proxy image, please update here. Since you are on 0.2.1 release, please delete default part of installation and re-create it.

For AIBrix, our goal is to not restrict users to use specific component or module and be as much extensible. We should schedule a meeting to discuss more on this...

> @varungup90 - will this cause issues? We don't want to use another gateway and cause problems on our system when using this! Current gateway will continue to work as-is,...

To summarize the issue, there are two aspects for prefix-cache routing which can be generalized 1) matching 2) load balancing. 1) For matching there are two implementations hash and tree...