Support for stream: false via extra-openai-models.yaml
It seems that register_model() in openai_models.py doesn't currently expect a stream variable to be in the .yaml file, and can_stream gets set to True by default.
Many organizations use an internal OpenAI-compatible API proxy or gateway to access OpenAI and control the keys. For this, extra-openai-models.yaml is the easy way to make proxies just work. However, currently there doesn't seem to be a mechanism to pass the equivalent of --no-stream via this .yaml. All OpenAI-compatible gateways are expected to be able to stream.
This makes --no-stream an obligatory option at CLI runtime with quick PoC API gateways, that for example implement key sharing or team-level usage limits for the org.
I've tried writing a streaming API proxy myself, and turns out streaming is not that trivial to implement. Non-streaming proxies are easy, so I suspect many teams in their hurry start with that. There are some streaming-capable proxy projects in Github, but I think it'd be logical if stream: false or can_stream: false via extra-openai-models.yaml was passed downstream to execute().
Or is there a better, already-supported way to do this? Such as model default options saved by the user? A bit like aliases are. I do admit that "how to call a model" and "what are its default options" isn't quite the same level question to define in the .yaml, design-wise.
While waiting for this configuration option to be implemented, I've been using this wrapper script as a work-around:
#!/usr/bin/env python3
"""A wrapper script for using the ``llm`` command line interface.
The script makes sure that streaming is disabled.
The script then runs the ``llm`` command line interface.
You can copy and rename this script, or even alias it as ``llm`` in your shell.
Make sure your custom model is defined in ``extra-openai-models.yaml``
You can look up the correct location for the file with::
llm logs path
"""
import llm.cli
MY_MODEL_ID = "my-custom-gpt-4o"
def main():
"""Run the ``llm`` command line interface enforcing no streaming."""
models = llm.get_model_aliases()
models[MY_MODEL_ID].can_stream = False
llm.cli.get_model_aliases = lambda: models
llm.cli.cli()
if __name__ == "__main__":
main()