Cameron Quilici issues

Results 19 issues of


                                            Cameron Quilici

[Usage]: Qwen2-VL-2B-Instruct Issue when passing a video URL to /chat/completions

### Your current environment Image: [v0.7.3](https://hub.docker.com/layers/vllm/vllm-openai/v0.7.3/images/sha256-4f4037303e8c7b69439db1077bb849a0823517c0f785b894dc8e96d58ef3a0c2) Run Command: `--model Qwen/Qwen2.5-VL-7B-Instruct --port 8080` GPU: NVIDIA H100 PCIe I am referencing [this](https://github.com/vllm-project/vllm/issues/9842#:~:text=chat_response%20%3D%20client.chat.completions.create(%0A%20%20%20%20model%3D%22llava%22%2C%0A%20%20%20%20messages%3D%5B%7B%0A%20%20%20%20%20%20%20%20%22role%22%3A%20%22user%22%2C%0A%20%20%20%20%20%20%20%20%22content%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%22type%22%3A%20%22text%22%2C%20%22text%22%3A%20%22What%E2%80%99s%20in%20this%20video%3F%22%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%22type%22%3A%20%22video_url%22%2C%20%22video_url%22%3A%20%7B%22url%22%3A%20video_url%7D%7D%2C%0A%20%20%20%20%20%20%20%20%5D%2C%0A%20%20%20%20%7D%5D%2C%0A)) and previosuly also [this](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_client_for_multimodal.html) (althought it appears this is dated)....

usage

[WIP]: Diff only runs

feat: multinode first class integration

# Design Doc: Multi-Node First Class Integration > **Disclaimer:** No changes in this PR affect any performance for either AMD or NVIDIA. Test runs: - runner model sweep: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19829647492 -...

Adding evals after throughput benchmarks

# Add Eval Runs After Throughput Benchmarks ## TL;DR - Adds **optional eval runs** (e.g. GSM8K) that run **right after throughput benchmarks**, reusing the same inference server. - Evals are...

AMD needs to use upstream SGLang images instead of fork

Fix issues in https://github.com/InferenceMAX/InferenceMAX/pull/247 Test Have inference engineer verify performance

p00

AMD

AMD needs to use upstream vLLM images instead of fork

For instance

p00

AMD

Cameron Quilici

[Usage]: Qwen2-VL-2B-Instruct Issue when passing a video URL to /chat/completions

[WIP]: Diff only runs

feat: multinode first class integration

Adding evals after throughput benchmarks

AMD needs to use upstream SGLang images instead of fork

AMD needs to use upstream vLLM images instead of fork

add env var validation in benchmarks/* scripts

multinode label validation

standardize way in which mutlinode result files are created

upstream and standardize Dynamo code to InferenceMAX