Yingge He

Results 16 issues of Yingge He

Currently, headers are lowercased before they are sent to the server, which causes inconvenience. According to HTTP specification header names are case-insensitive. Therefore, the forward header pattern (regex) should match...

Add `--mode-config-name` option when starting Triton server. Allow users to select custom configurations other than default config.pbtxt. server: https://github.com/triton-inference-server/server/pull/7185

Add `--mode-config-name` option when starting Triton server. Allow users to select custom configurations other than default config.pbtxt.

#### What does the PR do? Copies executable to qa/L0_input_validation directory and depended models to qa/L0_input_validation/models. #### Checklist - [x] PR title reflects the change and is of format `:...

PR: test

#### What does the PR do? Add client input size check to make sure input shape byte size matches input data byte size. #### Checklist - [x] PR title reflects...

enhancement

#### What does the PR do? Add client input size check to make sure input shape byte size matches input data byte size. #### Checklist - [x] PR title reflects...

PR: test

List metrics in `vllm:*` instead of the variable name.

documentation

#### What does the PR do? The PR adds tests to histogram metrics and new `nv_inference_first_response_histogram_ms`. 1. Verify this metric is only created in decoupled models. 2. Tests `--metrics-config histogram_latencies=`....

PR: docs
PR: test

#### What does the PR do? Add client input size check to make sure input shape byte size matches input data byte size. #### Checklist - [x] PR title reflects...

enhancement

#### What does the PR do? Add a new API "EnqueueIfCapacityAvailable" to ThreadPool for efficient task enqueue. #### Checklist - [x] PR title reflects the change and is of format...

enhancement