continue file context provider throw error on smaill file

Before submitting your bug report

[x] I believe this is a bug. I'll try to join the Continue Discord for questions
[x] I'm not able to find an open issue that reports the same bug
[x] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS:
- Continue version: 1.0.6
- IDE version: 1.96.2
- Model:
- config:
  

  
  OR link to assistant in Continue hub:

Description

When I use @file in Continue to reference a file, if the code exceeds 12K or 400 lines, it directly gives me the following error: "'demo.py' is 170.1 KB which exceeds the allowed context length and cannot be processed by the model." This is clearly incorrect, as my model can handle a context of 128K tokens.

To reproduce

No response

Log output

Apr 22 '25 14:04 AndrewTsao

Seeing this with VScode on recent 1.0.6 release version as well. Maybe due to changes to context size defaults in v1.0.6?

I was able to get it working again, by tweaking my local setup and adding contextLength option in the config yaml like this:

models:
  - name: Gemma 3 27B
    provider: ollama
    model: gemma3:27b
    defaultCompletionOptions:
     contextLength: 131072
    roles:
      - chat
      - edit
      - apply

Apr 23 '25 10:04 GrimmiMeloni

Probably related changes: #4602 and #4929 - maybe @Jazzcort and/or @RomneyDa can chime in?

Apr 23 '25 13:04 GrimmiMeloni

@AndrewTsao Can you just check if contextLength is correctly set to 128k in your config.yaml like what @GrimmiMeloni did? For code context, 400 lines might be around 4000 tokens so if your contextLength is set to 128k this warning won't pop up.

Apr 23 '25 14:04 Jazzcort

Just for clarification - for me at least the unchanged setup (without the defaultCompletionOptions) worked till last week. Then after a break over the weekend, I now today found continue to give the aforementioned error regarding context length.

So maybe this is more like an expected breaking change, in the sense that from 1.0.6 forward, the config.yaml must include these settings?

Apr 23 '25 16:04 GrimmiMeloni

@GrimmiMeloni Forbidding file mention for files that have token size larger than context length is a new feature in 1.0.6. You can see the announcement in the 1.0.6 release. If you don't set contextLength in config.yaml, Continue will first ask your backend provider for this value and if context length is not set there either, context length will be set to 8192 as a default value.

Apr 23 '25 17:04 Jazzcort

Understood. That would mean in my case that ollama is not providing the context length via its API, although it knows the model's meta data.

$ ollama show gemma3:27b
  Model
    architecture        gemma3
    parameters          27.4B
    context length      131072
    embedding length    5376
    quantization        Q4_K_M

Is there any way to drill deeper, e.g. see the request the continue makes to ollama to introspect the model?

Apr 23 '25 19:04 GrimmiMeloni

@GrimmiMeloni You can check out the Rich LLM logging implemented by @owtaylor which is also a new feature in 1.0.6 release. You can see the LLM options sent from Continue to Ollama which might be something you're interested. To use this feature you need to enable it in the Continue setting.

Apr 23 '25 19:04 Jazzcort

I tried that, but I don't see continue sending any context size information when removing the entry in the configuration.

Based on the output from ollama serve it seems that ctxSize is getting defaulted to 8k instead of the supported size according to the model meta information.

time=2025-04-23T23:22:28.926+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --model /Users/mhess/.ollama/models/blobs/sha256-e796792eba26c4d3b04b0ac5adb01a453dd9ec2dfd83b6c59cbf6fe5f30b0f68 --ctx-size 8192 --batch-size 512 --n-gpu-layers 63 --threads 8 --parallel 1 --port 59864"

Apr 23 '25 21:04 GrimmiMeloni

This is my config.yaml:

name: Local Assistant
version: 1.0.0
schema: v1

models:
  - model: Qwen2.5-Coder-32B-Instruct
    name: Qwen2.5-Coder-32B-Instruct
    provider: openai
    apiKey: sxxxx
    apiBase: http://hpc-xxxx:2002/v1
    systemMessage: 你是一个专业的软件开发助手，请用中文尽量清洁明了地回答问题。
    defaultCompletionOptions:
      temperature: 0.6

    roles:
      - chat
      - edit
      - apply
      - autocomplete

then, throw the error message box.

Apr 24 '25 10:04 AndrewTsao

@AndrewTsao Can you just check if contextLength is correctly set to 128k in your config.yaml like what @GrimmiMeloni did? For code context, 400 lines might be around 4000 tokens so if your contextLength is set to 128k this warning won't pop up.

As per your suggestion, I adjusted the contextLength parameter to 1000000, and the error dialog no longer appears.

Apr 24 '25 10:04 AndrewTsao

@GrimmiMeloni If the contextLength is correctly set, you will see it appears in the options like this.

Apr 24 '25 12:04 Jazzcort

As per your suggestion, I adjusted the contextLength parameter to 1000000, and the error dialog no longer appears.

@AndrewTsao I'm glad that you solved the issue. By the way, I don't think you need to set contextLength that large to make this work.

Apr 24 '25 12:04 Jazzcort

@GrimmiMeloni If the contextLength is correctly set, you will see it appears in the options like this.

Yes, I can confirm this.

However, you stated above:

If you don't set contextLength in config.yaml, Continue will first ask your backend provider for this value and if context length is not set there either, context length will be set to 8192 as a default value.

To my understanding the expected behavior regarding context length is:

Take explicitly configured contextLength, otherwise
Introspect model meta data via LLM server to pick context, otherwise
fall back to 8k default

What I am seeing is both 1) and 3) above happening. But I don't see 2) (for ollama provider).

Apr 24 '25 15:04 GrimmiMeloni

@GrimmiMeloni You can check the constructor in core/llm/llms/Ollama.ts where it fires a api/show request to get num_ctx from Ollama for the specific model you use. If num_ctx exists and we don't set contextLength in config.yaml, contextLength will get updated to num_ctx. This value is under the hood which can not be checked from continue console, but it affect the behavior when Continue truncates the message and also the file mention forbidding feature.

You can use this command to check if num_ctx exists in your model. However, If you just use the base model, num_ctx should not exist in the response.

curl http://localhost:11434/api/show -d '{
  "model": "custom-granite:latest"
}' | jq | grep "\"parameters\": \"num_ctx"

Apr 24 '25 19:04 Jazzcort

Is it because my model is running with vLLM instead of ollama that this issue is being triggered?

Apr 25 '25 00:04 AndrewTsao

@AndrewTsao I don't think so, if you set the contextLength to 20,000, does the warning still show up?

Apr 25 '25 18:04 Jazzcort

@Jazzcort thanks for the curl line to emulate introspecting. We are getting closer.

For my local gemma3:27b there is no num_ctx in the reply. However, what I do get back is "gemma3.context_length": 131072.

Is this potentially an incompatibility with ollama presenting the model data differently (num_ctx vs. <model>.context_length) ?

Apr 25 '25 22:04 GrimmiMeloni

Hi @GrimmiMeloni - there are two separate things here:

The maximum context length that the model can handle; this can be found in documentation in the model, and might be also be present in some custom model parameter.
The Ollama parameter for the maximum context length it will use when handling the request - this is num_ctx. This determines how much space Ollama will allocate for the "KV cache"

Especially now that models have became able to handle longer and longer contexts, the model maximum wouldn't make a good default value for num_ctx - it might require more vram than is available, and the "prefill" process for such a long context length will be very slow.

Introspect model meta data via LLM server to pick context

Refers to the num_ctx parameter. Typically this will not be set for a model that you download off the internet, and defaults will be used (2048 for ollama run, 8096 for continue) but you can set it, e.g.:

ollama run llama3.1:8b
/set parameter num_ctx 131072
/save llama3.1:8b-128k

Apr 26 '25 00:04 owtaylor

@owtaylor I appreciate you taking the time to explain the difference. Today I learned. 👍

With that clear, I would like to just close the loop, coming back to a previous statement from @Jazzcort

@GrimmiMeloni Forbidding file mention for files that have token size larger than context length is a new feature in 1.0.6. You can see the announcement in the 1.0.6 release.

Can you point me to said announcement? I would like to understand where I can check for future breaking changes to get myself unblocked quicker. (I checked the github changelog on the 1.0.6 release and also the blog on continue.dev, but could not find any mention of this change.)

If you don't set contextLength in config.yaml, Continue will first ask your backend provider for this value and if context length is not set there either, context length will be set to 8192 as a default value.

As I initially posted above, my user experience up to 1.0.5 (i.e. without the 8k limit) things "just worked" out of box for me. Was there a larger default previously? If not, was the "it just worked" maybe an illusion, in the sense that while there was no explicit error message the 8k limit was always in place? And if so, just for my understanding what did actually happen in versions prior 1.0.6 if the context "overran" ?

Apr 26 '25 08:04 GrimmiMeloni

Can you point me to said announcement? I would like to understand where I can check for future breaking changes to get myself unblocked quicker. (I checked the github changelog on the 1.0.6 release and also the blog on continue.dev, but could not find any mention of this change.)

Is this the one you checked? The forbidding large file mention feature is mentioned here. https://github.com/continuedev/continue/releases/tag/v1.0.6-vscode

As I initially posted above, my user experience up to 1.0.5 (i.e. without the 8k limit) things "just worked" out of box for me. Was there a larger default previously? If not, was the "it just worked" maybe an illusion, in the sense that while there was no explicit error message the 8k limit was always in place? And if so, just for my understanding what did actually happen in versions prior 1.0.6 if the context "overran" ?

Before this feature, file that exceeds the context length would be truncated from the top silently.

Apr 29 '25 13:04 Jazzcort

Is this the one you checked? The forbidding large file mention feature is mentioned here. https://github.com/continuedev/continue/releases/tag/v1.0.6-vscode

I did, but did not make the connection. I guess I was looking for something explicit like [BC] or [breaking] as an indicator. Thanks for pointing me in the right direction.

Before this feature, file that exceeds the context length would be truncated from the top silently.

OK, I guess that explains it. I do not want to speak for you @AndrewTsao , but for me it looks like this could be closed as "works as expected". WDYT?

May 05 '25 06:05 GrimmiMeloni

@AndrewTsao I don't think so, if you set the contextLength to 20,000, does the warning still show up?

If I set this parameter, it won't report an error.

If I comment out this paramter:

May 06 '25 10:05 AndrewTsao