Varun Gupta

Results 87 comments of Varun Gupta

Thank you for the PR! Few comments 1. In PR: https://github.com/vllm-project/aibrix/pull/1145, openai-golang version was bumped and now unmarshal support for request message is added (previously unmarshal for request body would...

Check this PR: https://github.com/vllm-project/aibrix/pull/1160

> Check this PR: #1160 This PR is merged, which should address your requirement. Can you rebase master for other changes OR better to close this PR and start other...

If the user has enabled rpm/tpm validation then we need to have include usage. To make include_usage optional will need check on whether user has enabled rpm/tpm limit check.

1. I want to understand where is the blocker if we mandate to include stream usage. For client, if they do not want to consume usage report then it is...

I have started a [PR](https://github.com/vllm-project/aibrix/pull/788) to make include_usage as optional param by default. If user's TPM limit is configured then include_usage is required. Heterogenous use case is not supported with...

This error occurs when envoy proxy tries to forward request and no pod is in ready state. When this error occurs envoy proxy does not receives response (headers or body),...

Only per request level information should be returned in response headers. The information listed in the issue is captured in the metrics which is reflected in dashboard and is queryable...

Request and response headers must be light weight. You can dump the state in logs for per request basis.