Patrick Devine
Patrick Devine
Unfortunately OpenAI's API doesn't have a way to do this, and we can't modify the `num_ctx` parameter directly with their API. I did write up a [doc](https://github.com/ollama/ollama/blob/main/docs/openai.md#setting-the-context-size) which explains how...
@deep1305 were you able to get it to work? We will enable it by default soon; still trying to get people to try it out and report bugs.
@mayunqing1230 unfortunately those cards are really old now and are probably not going to be very performant. I'm going to go ahead and close the issue.
I'm fairly certain this is a packaging bug in Arch as we've seen a few of those issues recently. Can you verify if installing directly works with the instructions on...
@SteavenGamerYT see the message above. Unfortunately it looks like the Arch linux package is broken (but we don't package it). You can install from the official binaries.
@A are you hitting this when you've run through the context, or some other case?
@A To use a larger default context you can run Ollama with: ``` OLLAMA_CONTEXT_LENGTH=8192 ollama serve ``` Here's a [link to the FAQ](https://docs.ollama.com/faq#how-can-i-specify-the-context-window-size)
Make sure you're on the latest version of Ollama and you `ollama pull glm4` to get the latest version. I just tested it w/ both linux and mac and it's...
`max_tokens` and `num_ctx` are definitely not the same thing. I just saw the `extra_body` API change which would allow you to set `num_ctx` which seems like a better route.
@werruww that's a GGUF model, not a safetensors model. That model is already available in ollama if you run `ollama run qwen2.5:7b-instruct-fp16`