Cameron Quilici
Cameron Quilici
### Your current environment Image: [v0.7.3](https://hub.docker.com/layers/vllm/vllm-openai/v0.7.3/images/sha256-4f4037303e8c7b69439db1077bb849a0823517c0f785b894dc8e96d58ef3a0c2) Run Command: `--model Qwen/Qwen2.5-VL-7B-Instruct --port 8080` GPU: NVIDIA H100 PCIe I am referencing [this](https://github.com/vllm-project/vllm/issues/9842#:~:text=chat_response%20%3D%20client.chat.completions.create(%0A%20%20%20%20model%3D%22llava%22%2C%0A%20%20%20%20messages%3D%5B%7B%0A%20%20%20%20%20%20%20%20%22role%22%3A%20%22user%22%2C%0A%20%20%20%20%20%20%20%20%22content%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%22type%22%3A%20%22text%22%2C%20%22text%22%3A%20%22What%E2%80%99s%20in%20this%20video%3F%22%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%7B%22type%22%3A%20%22video_url%22%2C%20%22video_url%22%3A%20%7B%22url%22%3A%20video_url%7D%7D%2C%0A%20%20%20%20%20%20%20%20%5D%2C%0A%20%20%20%20%7D%5D%2C%0A)) and previosuly also [this](https://docs.vllm.ai/en/latest/getting_started/examples/openai_chat_completion_client_for_multimodal.html) (althought it appears this is dated)....
# Design Doc: Multi-Node First Class Integration > **Disclaimer:** No changes in this PR affect any performance for either AMD or NVIDIA. Test runs: - runner model sweep: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19829647492 -...
# Add Eval Runs After Throughput Benchmarks ## TL;DR - Adds **optional eval runs** (e.g. GSM8K) that run **right after throughput benchmarks**, reusing the same inference server. - Evals are...
Fix issues in https://github.com/InferenceMAX/InferenceMAX/pull/247 Test Have inference engineer verify performance
PR #251 introduces the `benchmarks/benchmark_lib.sh:check_env_vars` function that simply checks whether a list of input environment variables are set. To be comprehensive, we should use this function in all of the...
As follow-up to #251 , integrate label validation for multinode (right now it's only single)
The way in which multinode result files are created depends on the framework and quite frankly the system it is running on. This needs to be standardized.
Dynamo code for launching multinode jobs is a bit convoluted and confusing to follow. We should standardize this and upstream code to local.