FastChat
FastChat copied to clipboard
Add `--max-model-len` param and description to `serve.vllm_worker`?
Currently the --max-model-len can be passed to vllm through kwarg, but could it be added to the default param like gpu-utilization-limit? It is often needed when using models that can accept long contexts.