Add `--max-model-len` param and description to `serve.vllm_worker`?

Open Lanture1064 opened this issue 1 year ago • 0 comments

Currently the --max-model-len can be passed to vllm through kwarg, but could it be added to the default param like gpu-utilization-limit? It is often needed when using models that can accept long contexts.

Mar 29 '24 03:03 Lanture1064