Varun Gupta comments

Results 87 comments of


                                            Varun Gupta

Support content array in completions

Thank you for the PR! Few comments 1. In PR: https://github.com/vllm-project/aibrix/pull/1145, openai-golang version was bumped and now unmarshal support for request message is added (previously unmarshal for request body would...

Support content array in completions

Check this PR: https://github.com/vllm-project/aibrix/pull/1160

Support content array in completions

> Check this PR: #1160 This PR is merged, which should address your requirement. Can you rebase master for other changes OR better to close this PR and start other...

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

If the user has enabled rpm/tpm validation then we need to have include usage. To make include_usage optional will need check on whether user has enabled rpm/tpm limit check.

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

1. I want to understand where is the blocker if we mandate to include stream usage. For client, if they do not want to consume usage report then it is...

[Bug] Remove compulsory `include_usage` when `stream=true` in gateway

I have started a [PR](https://github.com/vllm-project/aibrix/pull/788) to make include_usage as optional param by default. If user's TPM limit is configured then include_usage is required. Heterogenous use case is not supported with...

Is the design of AIBrix decoupled from accelerator(NV/Ascend 910/and others)?

Yes

upstream connect error or disconnect/reset before headers

This error occurs when envoy proxy tries to forward request and no pod is in ready state. When this error occurs envoy proxy does not receives response (headers or body),...

Piggybacking more information in response header

Only per request level information should be returned in response headers. The information listed in the issue is captured in the metrics which is reflected in dashboard and is queryable...

Piggybacking more information in response header

Request and response headers must be light weight. You can dump the state in logs for per request basis.