file context provider throw error on smaill file
Before submitting your bug report
- [x] I believe this is a bug. I'll try to join the Continue Discord for questions
- [x] I'm not able to find an open issue that reports the same bug
- [x] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS:
- Continue version: 1.0.6
- IDE version: 1.96.2
- Model:
- config:
OR link to assistant in Continue hub:
Description
When I use @file in Continue to reference a file, if the code exceeds 12K or 400 lines, it directly gives me the following error: "'demo.py' is 170.1 KB which exceeds the allowed context length and cannot be processed by the model." This is clearly incorrect, as my model can handle a context of 128K tokens.
To reproduce
No response
Log output
Seeing this with VScode on recent 1.0.6 release version as well. Maybe due to changes to context size defaults in v1.0.6?
I was able to get it working again, by tweaking my local setup and adding contextLength option in the config yaml like this:
models:
- name: Gemma 3 27B
provider: ollama
model: gemma3:27b
defaultCompletionOptions:
contextLength: 131072
roles:
- chat
- edit
- apply
Probably related changes: #4602 and #4929 - maybe @Jazzcort and/or @RomneyDa can chime in?
@AndrewTsao Can you just check if contextLength is correctly set to 128k in your config.yaml like what @GrimmiMeloni did? For code context, 400 lines might be around 4000 tokens so if your contextLength is set to 128k this warning won't pop up.
Just for clarification - for me at least the unchanged setup (without the defaultCompletionOptions) worked till last week.
Then after a break over the weekend, I now today found continue to give the aforementioned error regarding context length.
So maybe this is more like an expected breaking change, in the sense that from 1.0.6 forward, the config.yaml must include these settings?
@GrimmiMeloni Forbidding file mention for files that have token size larger than context length is a new feature in 1.0.6. You can see the announcement in the 1.0.6 release. If you don't set contextLength in config.yaml, Continue will first ask your backend provider for this value and if context length is not set there either, context length will be set to 8192 as a default value.
Understood. That would mean in my case that ollama is not providing the context length via its API, although it knows the model's meta data.
$ ollama show gemma3:27b
Model
architecture gemma3
parameters 27.4B
context length 131072
embedding length 5376
quantization Q4_K_M
Is there any way to drill deeper, e.g. see the request the continue makes to ollama to introspect the model?
@GrimmiMeloni You can check out the Rich LLM logging implemented by @owtaylor which is also a new feature in 1.0.6 release. You can see the LLM options sent from Continue to Ollama which might be something you're interested. To use this feature you need to enable it in the Continue setting.
I tried that, but I don't see continue sending any context size information when removing the entry in the configuration.
Based on the output from ollama serve it seems that ctxSize is getting defaulted to 8k instead of the supported size according to the model meta information.
time=2025-04-23T23:22:28.926+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --model /Users/mhess/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --threads 8 --parallel 1 --port 59864"
This is my config.yaml:
name: Local Assistant
version: 1.0.0
schema: v1
models:
- model: Qwen2.5-Coder-32B-Instruct
name: Qwen2.5-Coder-32B-Instruct
provider: openai
apiKey: sxxxx
apiBase: http://hpc-xxxx:2002/v1
systemMessage: 你是一个专业的软件开发助手,请用中文尽量清洁明了地回答问题。
defaultCompletionOptions:
temperature: 0.6
roles:
- chat
- edit
- apply
- autocomplete
then, throw the error message box.
@AndrewTsao Can you just check if
contextLengthis correctly set to 128k in your config.yaml like what @GrimmiMeloni did? For code context, 400 lines might be around 4000 tokens so if yourcontextLengthis set to 128k this warning won't pop up.
As per your suggestion, I adjusted the contextLength parameter to 1000000, and the error dialog no longer appears.
@GrimmiMeloni If the contextLength is correctly set, you will see it appears in the options like this.
As per your suggestion, I adjusted the contextLength parameter to 1000000, and the error dialog no longer appears.
@AndrewTsao I'm glad that you solved the issue. By the way, I don't think you need to set contextLength that large to make this work.
@GrimmiMeloni If the
contextLengthis correctly set, you will see it appears in the options like this.
Yes, I can confirm this.
However, you stated above:
If you don't set contextLength in config.yaml, Continue will first ask your backend provider for this value and if context length is not set there either, context length will be set to 8192 as a default value.
To my understanding the expected behavior regarding context length is:
- Take explicitly configured contextLength, otherwise
- Introspect model meta data via LLM server to pick context, otherwise
- fall back to 8k default
What I am seeing is both 1) and 3) above happening. But I don't see 2) (for ollama provider).
@GrimmiMeloni You can check the constructor in core/llm/llms/Ollama.ts where it fires a api/show request to get num_ctx from Ollama for the specific model you use. If num_ctx exists and we don't set contextLength in config.yaml, contextLength will get updated to num_ctx. This value is under the hood which can not be checked from continue console, but it affect the behavior when Continue truncates the message and also the file mention forbidding feature.
You can use this command to check if num_ctx exists in your model. However, If you just use the base model, num_ctx should not exist in the response.
curl http://localhost:11434/api/show -d '{
"model": "custom-granite:latest"
}' | jq | grep "\"parameters\": \"num_ctx"
Is it because my model is running with vLLM instead of ollama that this issue is being triggered?
@AndrewTsao I don't think so, if you set the contextLength to 20,000, does the warning still show up?
@Jazzcort thanks for the curl line to emulate introspecting. We are getting closer.
For my local gemma3:27b there is no num_ctx in the reply.
However, what I do get back is "gemma3.context_length": 131072.
Is this potentially an incompatibility with ollama presenting the model data differently (num_ctx vs. <model>.context_length) ?
Hi @GrimmiMeloni - there are two separate things here:
- The maximum context length that the model can handle; this can be found in documentation in the model, and might be also be present in some custom model parameter.
- The Ollama parameter for the maximum context length it will use when handling the request - this is
num_ctx. This determines how much space Ollama will allocate for the "KV cache"
Especially now that models have became able to handle longer and longer contexts, the model maximum wouldn't make a good default value for num_ctx - it might require more vram than is available, and the "prefill" process for such a long context length will be very slow.
- Introspect model meta data via LLM server to pick context
Refers to the num_ctx parameter. Typically this will not be set for a model that you download off the internet, and defaults will be used (2048 for ollama run, 8096 for continue) but you can set it, e.g.:
ollama run llama3.1:8b
/set parameter num_ctx 131072
/save llama3.1:8b-128k
@owtaylor I appreciate you taking the time to explain the difference. Today I learned. 👍
With that clear, I would like to just close the loop, coming back to a previous statement from @Jazzcort
@GrimmiMeloni Forbidding file mention for files that have token size larger than context length is a new feature in 1.0.6. You can see the announcement in the 1.0.6 release.
Can you point me to said announcement? I would like to understand where I can check for future breaking changes to get myself unblocked quicker. (I checked the github changelog on the 1.0.6 release and also the blog on continue.dev, but could not find any mention of this change.)
If you don't set contextLength in config.yaml, Continue will first ask your backend provider for this value and if context length is not set there either, context length will be set to 8192 as a default value.
As I initially posted above, my user experience up to 1.0.5 (i.e. without the 8k limit) things "just worked" out of box for me. Was there a larger default previously? If not, was the "it just worked" maybe an illusion, in the sense that while there was no explicit error message the 8k limit was always in place? And if so, just for my understanding what did actually happen in versions prior 1.0.6 if the context "overran" ?
Can you point me to said announcement? I would like to understand where I can check for future breaking changes to get myself unblocked quicker. (I checked the github changelog on the 1.0.6 release and also the blog on continue.dev, but could not find any mention of this change.)
Is this the one you checked? The forbidding large file mention feature is mentioned here. https://github.com/continuedev/continue/releases/tag/v1.0.6-vscode
As I initially posted above, my user experience up to 1.0.5 (i.e. without the 8k limit) things "just worked" out of box for me. Was there a larger default previously? If not, was the "it just worked" maybe an illusion, in the sense that while there was no explicit error message the 8k limit was always in place? And if so, just for my understanding what did actually happen in versions prior 1.0.6 if the context "overran" ?
Before this feature, file that exceeds the context length would be truncated from the top silently.
Is this the one you checked? The forbidding large file mention feature is mentioned here. https://github.com/continuedev/continue/releases/tag/v1.0.6-vscode
I did, but did not make the connection. I guess I was looking for something explicit like [BC] or [breaking] as an indicator. Thanks for pointing me in the right direction.
Before this feature, file that exceeds the context length would be truncated from the top silently.
OK, I guess that explains it. I do not want to speak for you @AndrewTsao , but for me it looks like this could be closed as "works as expected". WDYT?
@AndrewTsao I don't think so, if you set the
contextLengthto 20,000, does the warning still show up?
If I set this parameter, it won't report an error.
If I comment out this paramter: