Olga Andreeva
Olga Andreeva
To avoid unnecessary copy during localization, I added `fetch_subdir` parameter to `LocalizePath`: If specified, will only download provided sub directory, otherwise all subdirectories will be downloaded. Does not affect files...
#### What does the PR do? This PR introduces an improvement to our frontend OpenTelemetry tracing implementation. The main goal of this refactoring is to enhance the reliability and flexibility...
### Your current environment ```text PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC...
TODO: [X] Fix non-graceful shut down [ ] re-implement `build_async_engine_client_from_engine_args` for our use-case [ ] implement ProxyStatLogger(VllmStatLoggerBase), which will be attached to a `MQLLMEngine` process and pass metrics updates via...
#### What does the PR do? This PR adds support for using multiple tokenizers in the OpenAI-compatible frontend, allowing different models to use their own specific tokenizers. This is crucial...