llama.cpp Reset context instead of quitting in interactive mode

It's really annoying that I have to restart the program every time it quits by [end of text] or exceeding context limits, as I need to reload model, which is inefficient. Is there any way to add an option that instead of quitting just resets to the initial prompt?

Mar 14 '23 21:03 avada-z

@ggerganov Please，Thanks！

Mar 15 '23 11:03 zzhkikyou

it's currently on remaining_tokens , the -n parameter, main verb is on llama_tokenize not sure where the context reset is.

Mar 15 '23 16:03 aratic

think this relates to #23 but not exactly the same

Mar 15 '23 16:03 aratic

I'm on an attempt to fix the [end of text] issue:

diff --git a/main.cpp b/main.cpp
index ca0fca8..126e53f 100644
--- a/main.cpp
+++ b/main.cpp
@@ -991,6 +991,14 @@ int main(int argc, char ** argv) {
             fflush(stdout);
         }

+        // check for [end of text]
+        if (params.interactive && embd.back() == 2) {
+            fprintf(stderr, " [end of text]\n");
+            // insert the antiprompt to continue the conversation.
+            // however, after this it seems like everything was lost.
+            embd_inp.insert(embd_inp.end(), antiprompt_inp.begin(), antiprompt_inp.end());
+        }
+
         // in interactive mode, and not currently processing queued inputs;
         // check if we should prompt the user for more
         if (params.interactive && embd_inp.size() <= input_consumed) {
@@ -1037,7 +1045,7 @@ int main(int argc, char ** argv) {
         }

         // end of text token
-        if (embd.back() == 2) {
+        if (!params.interactive && embd.back() == 2) {
             fprintf(stderr, " [end of text]\n");
             break;
         }

However, it looks like the context is broken when resuming the conversation after the [end of text]... It's like starting a fresh interaction. So I guess being able to restore the context will be necessary to complete this.

Mar 16 '23 07:03 j3k0

Thank you for using llama.cpp and thank you for sharing your feature request. While you've provided valuable feedback on UX improvements, it overlaps a lot with what's being discussed in #23, and right now my top priority is to solve this issue by fixing the underlying technical issue described in #91. Please use those other issues for further discussion, which will reach a broader audience of folks following it.

Mar 16 '23 12:03 jart