lmdeploy
lmdeploy copied to clipboard
[Feature] Can we support parameter n in OpenAI compatible API?
Motivation
In /v1/completions and /v1/chat/completions endpoint, can we support the parameter n?
So that we can sampling multiple outputs for the same input.
Currently, we can only call the endpoint multiple times which is not efficient.
Related resources
No response
Additional context
No response