continue Show warning message when last user input get pruned

Description

If the user's last input is pruned due to context overflow, a warning message will be displayed in the chat section, alerting them that some details may have been lost. As a result, the response they receive might be incomplete or inaccurate due to the truncated input. https://github.com/Granite-Code/granite-code/issues/22

Screenshots

show-warning-in-chat

Testing instructions

Set the model’s context length to a small value (e.g., 512) and ask a question that exceeds contextLength - maxTokens tokenwise. A warning message will appear at the bottom of the chat section, indicating that some input may have been truncated. Deleting previous messages will remove the warning.

Mar 25 '25 18:03 Jazzcort

Deploy request for continuedev pending review.

Visit the deploys page to approve it

Name	Link
Latest commit	a9be9959679d413df1accad0ca44418d7af4d84c

Mar 25 '25 18:03 netlify[bot]

@RomneyDa I'll try to find another way to send the warning message back to the webview.

Mar 27 '25 21:03 Jazzcort

@Jazzcort I'd be interested in having this sort of "stream warning" idea as well so fell free to bounce approach/ideas here before spending too much time on them! I think there could be several different approaches where the warnings aren't persisted to chat history, maybe passing them with streams but with a "warning:" field that is captured in redux streamUpdate and temporarily added to UI or something. Let me know thoughts

Mar 28 '25 22:03 RomneyDa

I've already implemented this second approach in the following branch: Jazzcort/warn-when-truncate-last-msg-v2.

Instead of sending the warning through the stream, I used the messenger system to deliver the warning message. With this approach, I make the pruning behavior occur before calling streamChat so I can reach the messenger reference. The advantage of this approach is that users receive the warning message before the streamed response rather than after it has finished.

Regarding the "warning:" field, are you suggesting adding it to AssistantChatMessage? I think that could work as well! I'm open to either approach—whichever aligns better with the project's design. We can also discuss the UI implementation afterward.

Mar 28 '25 23:03 Jazzcort

Another idea would be to extend @Jazzcort's last approach and have two separate calls from webview => core.

llm/pruneChat => {prunedMessages, warning?: string}
llm/streamChat

or something like that. (llm.streamChat could still prune itself when it's not being invoked from the chatview.)

This would avoid having to worry about the interaction between warnings and streaming. It could also be potentially useful for some other things:

prefix-caching sensitive pruning (project writeup: https://github.com/Granite-Code/granite-code/issues/96)
having some way to reveal the pruned messages in the UI. I'm not at all sure that this is a good idea - the user can look at logs - but I do feel that it can be deceptive to have a rich chat history with no indication that only a tiny fraction of it might be actually be sent to the model.

Mar 31 '25 10:03 owtaylor

I agree with @owtaylor's suggestion. Calling llm/pruneChat before llm/streamChat not only helps manage context length but also provides control over whether llm/streamChat is called at all. Users might not focus too much on the last response when they see the warning message.

Additionally, leveraging Context stability - taking advantage of prefix caching is a great strategy. It can enhance the user experience by reducing response time when the context limit is reached. @RomneyDa If everything sounds great, I'll start working on it.

Mar 31 '25 13:03 Jazzcort

Planning to look into this tomorrow midday!

Apr 01 '25 06:04 RomneyDa

@Jazzcort @owtaylor would agree that two separate calls is a good approach, especially having the warning up front would be great and it won't effect all the other uses of streamChat in core, etc. llm/pruneChat could work, I'd be interested in approaches to this that don't do the pruning twice, but also use the same streamChat function. Perhaps some kind of alreadyPruned boolean passed to llm.streamChat. Counting tokens can be a bit expensive but not bad.

@sestinj tagging since touches core streaming

Apr 02 '25 02:04 RomneyDa

I'm planing to implement llm/compileChat which call compileChatHistory to prune the chat messages we pass and return the compiled chat messages and a boolean that indicates whether we should warn users or not. I'll also add a boolean parameter to llm/streamChat so we won't do the pruning twice. How do you guys think? @owtaylor @RomneyDa @sestinj

Apr 03 '25 15:04 Jazzcort

Not to make this a design by committee, but my only feedback is that I think the warning should be yellow rather than red. Red feels a bit too scary for the severity of this warning imo.

Great contribution though, thanks @Jazzcort ! 👌

Apr 08 '25 15:04 Patrick-Erichsen

@Patrick-Erichsen I’ve updated the warning message color to yellow—thanks for the feedback! @RomneyDa Let me know if you'd like to tweak the UI further. We can either make those changes now or go ahead and merge this PR and revisit the design later.

Apr 08 '25 17:04 Jazzcort

@Jazzcort we've made several changes to pruning and message compilation recently that conflict with this. I will take a look and see if I can pull your branch and resolve this.

Acknowledging a long delay on this one, it's such a sensitive part of the code that I've put it off till I have time to properly dig in

Apr 25 '25 17:04 RomneyDa

@RomneyDa No problem! Let me know if you find anything that is unclear for this feature or you need a hand on resolving the conflicts! If the conflict is too complex to resolve, I can try to re-implement this base on the newest commit. 😊

Apr 25 '25 18:04 Jazzcort

@Jazzcort I took a look at this and planned to fix the conflicts, but I think it's going to be much safer for either you to re-implement on top, or for Dallin to handle the merge, and he is out until start of next week. Both directions would be okay, just a trade-off of speed vs. amount of work

May 05 '25 22:05 sestinj

I'll try to reimplement this on top this week! Will ping you when I'm done. 😄

May 05 '25 22:05 Jazzcort

sounds great! thank you 🙏

May 07 '25 00:05 sestinj

I found that with the newly implemented compileChatMessages function, the last user input will never get deleted or truncated. For the condition that the last user input can not fit in the context length, this hard error will be raised instead.

Screenshot 2025-05-07 at 12 54 26 PM

@allanday What do you think about this?

May 07 '25 17:05 Jazzcort

@allanday What do you think about this?

Assuming I am understanding the issue correctly, I'd suggest showing this error in the chat history similar to your previous version, so when the user hits return they get a response like: "Sorry, unable to answer - context has reached its limit". If needed you could provide some details in that message, or have a button to view logs and so on.

Showing the message inline will be less disruptive than using a popup. Once it's agreed what should be in the error I can sketch what it could look like, if that would be helpful.

Going one step better would be to provide feedback that the query won't be answered prior to it being submitted: once the draft content has exceeded the limit you could make return insensitive and pop up a warning, so users can adjust their input prior to submission. I get that that could be more technically challenging though.

May 14 '25 13:05 allanday

@RomneyDa Do you think we can take lastMessagesTokens out of the calculation of inputTokenAvailable in core/llm/countTokens.ts, so we can handle the situation where the user's last input gets removed differently?

May 15 '25 14:05 Jazzcort

@RomneyDa @sestinj If we can take the last user input out of the calculation of inputTokenAvailable, we can have a different display for it. Here is the demo video! Let me know what you guys think about this!

https://github.com/user-attachments/assets/60799c80-9c7a-47fe-a847-e92d95ed208d

I also tried to place this display right above the input box, but since we are using relative display, there will be a gap between the message history and the input box which looks a bit buggy. I also tried to change the user input box to an absolute display, but it cause another issue that the input box might block the delete button of some messages which I think is not acceptable.

https://github.com/user-attachments/assets/2e5bfd0f-acb0-4989-8e7e-55dd72480fbc

P.S. This implementation is based on a fairly recent commit of Continue, so no complicated conflicts I think.

May 16 '25 15:05 Jazzcort

@RomneyDa @sestinj So we end up thinking the way we notifying users when the last user input is deleted in previous proposal is not strong enough and we probably need to provide more guidance to users so we can prevent them from getting the inaccurate answer. I think we can change the color of the delete button for the last user input and maybe add a little animation to it so this could serve as a clearer suggestion for users to delete that message.

https://github.com/user-attachments/assets/7cd111ac-f1f6-47e8-becf-3829eeabf272

May 19 '25 13:05 Jazzcort

@Jazzcort, sorry about the further delay on this (one of the oldest PRs at this point!!) I've gone back and forth on this, I think that it's important that we add some indication that pruning is occurring, I think the additional message compilation message is the right approach. I think that this is such a common occurrence that I'm tentative about yellow warnings, but I do think your most recent iteration with the little yellow dot is pretty nice.

The message shown to the user should make three points clear

inform: context exceeds window, can't include all
hint at temporary solution: that it only prunes the back side of the conversation
call to action: recommend they start a new chat, maybe even give them a new session button

e.g. maybe "Chat history exceeds model's context length (XXX tokens). Old messages will not be included. Start a new session [clickable]"

Or even more concise, "Long chat, old messages are being excluded. New Session [clickable]" or similar

I like the more subtle warning. Let's keep brainstorming on ways to handle the error after system message, last message, and tools were all made non-negotiable. What that change actually did was just remove the ability for system message lines to be pruned, and account for tools tokens. The user message text content behavior was the same before, and it should probably present some kind of error but I do agree that a popup could be annoying. But it should be a bit of an edge case like huge files etc. Personally I don't think highlighting the trash icon is ideal, but I like the direction. Maybe we should just pop up the same little yellow warning with modified text "can't submit"

Thoughts?

Jun 07 '25 07:06 RomneyDa

Hi Dallin - thanks for the reply here! If I understand correctly, what you are saying is:

You think that the expandable-yellow-dot approach is pretty good for when the top of the conversation is pruned
But you think that when the last message is pruned there should be something that is more clearly an error message
To avoid an annoying popup, that error message could maybe share formatting with the expanded yellow dot.

? If that's a correct interpretation, would you see it looking more error like - red instead of yellow, etc.?

I'm also not a big fan of highlighting the trash can, because I think what the user will almost always want to do is edit the provided context, rather than starting over with a new message.

Jun 09 '25 15:06 owtaylor

@owtaylor I think a red dot that auto shows in case of non-negotiable tokens exceeding context length and a yellow dot that only shows when clicked would be great. Any other solutions/thoughts in mind? Appreciate the iterations!

Jun 12 '25 17:06 RomneyDa

⚠️ Only 5 files will be analyzed due to processing limits.

Jun 13 '25 21:06 recurseml[bot]

😱 Found 3 issues. Time to roll up your sleeves! 😱

Jun 13 '25 21:06 recurseml[bot]

@RomneyDa Red dot for the non-negotiable token exceeding case and yellow dot for the case when we start to prune the chat message from the top sound good to me. And we can display different error messages and instructions for users to deal with the situations.

@owtaylor If this also sounds great to you, I'll go ahead and implement this feature!

Jun 16 '25 13:06 Jazzcort

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

Jul 01 '25 15:07 github-actions[bot]

@RomneyDa Sorry for the delay. I've been waiting for the CLA thing to be cleared. I think @sestinj will manually check that out for us. For this PR, I did tweak it a little bit. We think the expendable dot might not be that intuitive for users to click on, so now the warning messages will directly be displayed at the bottom of the last user input box. I added the recommended action for each situation (open config file for the non-negotiable case, start a new session for the pruning from top case). We can add more actions in the future if needed. Let me know what do you think!

https://github.com/user-attachments/assets/b68b0731-5a02-458a-a6bc-4b2c71013832

Jul 01 '25 15:07 Jazzcort

https://github.com/user-attachments/assets/ba5ef5df-1ee6-466c-bbe7-448694ee29ff

@RomneyDa Sorry for the frequent changes. This should be the final tweak I guess. 😄 Directly showing warning messages when Continue start to prune the chat messages from top might be a little intrusive so we decide to add the expendable dot back for just this situation. For the non-negotiable case, we just show the error message directly without expendable feature.

Jul 02 '25 14:07 Jazzcort

Show warning message when last user input get pruned

Description

Screenshots

Testing instructions

👷 Deploy request for continuedev pending review.

Deploy request for continuedev pending review.