obsidian-copilot icon indicating copy to clipboard operation
obsidian-copilot copied to clipboard

Model loses sight of user prompt when given large linked articles (Ollama tested)

Open jack5github opened this issue 1 year ago • 1 comments

  • [x] Disable all other plugins besides Copilot (required)
  • [x] Screenshot of note + Copilot chat pane + dev console added (required)

Copilot version: 2.8.6 and 2.8.8

Describe how to reproduce

  1. Set the system prompt to that which is included below, and the temperature and tokens limit to 0.3 and 3000 respectively. This is to emulate the same settings as my personal vault.
  2. Create the Gibberish.md article using the text included below.
  3. Send the user prompt included below as a message to an Ollama model. With a high degree of certainty, it will fail to respond simply with 'Understood', instead summarising the whole article or attempting to answer a random question about its content. This has been tested on many Ollama models and all behave similarly.
  4. Remove the 'End of understanding' section and try step 3 again. The models will now succeed in replying as per the instructions.
System prompt
You are Matheson, an intelligent note-taking assistant for Jack5, an Australian YouTuber. Your role is to assist Jack5 by processing his queries (found at the top of messages) and the contents of referenced notes. These notes contain Markdown formatting where ==highlights== indicate inadequate or incomplete text. When responding:
- Ensure your responses are **unique** and **creative** while meeting the needs of the query.
- Respond in a matter-of-fact tone. Do not prepend or append summaries of what you have done.
- Use Markdown for formatting, but restrict yourself to headings (e.g.: #, ##, etc.), bold (e.g.: **text**) and bullet points (e.g.: -). Do not use square brackets.
Gibberish.md
---
create-date: 2025-03-29T14:25
edit-date: 2025-03-29T14:40
---
# Introduction

This article contains wholly irrelevant information regarding the prompt in question. It appears to be a deliberate attempt to obfuscate and distract, offering a stream of data that hold absolutely no bearing on the intended topic. Its intention is to confuse artificial intelligence models by obscuring the core focus with extraneous details and seemingly random text.

# Theory

This is likely caused by an overabundance of characters or tokens in the article, which overwhelms the model's processing capabilities, leading it to latch onto irrelevant patterns in the article rather than addressing the user's prompt. The sheer volume of text is enough to distort the model’s understanding and cause it to prioritise superficial analysis over genuine comprehension.

# Frontmatter

The YAML frontmatter of articles is also likely to be a contributing factor. The structured data format, while intended for organisation and retrieval, can introduce noise into the model’s interpretation. Specifically, the presence of key-value pairs, such as `create-date` and `edit-date`, might be parsed and weighted disproportionately, leading the model to focus on these metadata elements instead of the article's content itself. The model might treat the dates as significant signals, attempting to correlate them with other data points, generating spurious connections and ultimately contributing to the irrelevant output. The YAML structure, therefore, acts as an additional layer of complexity, potentially amplifying the effect of the large article content.

# Effects

What all of these issues lead to is a cascade of misinterpretations and ultimately, a failure to fulfil the user’s request. The model becomes trapped in a feedback loop of analysing irrelevant data, reinforcing the initial distortion. The combined effect of excessive content, noisy frontmatter, and the model’s inherent tendency to seek patterns, results in a response that is tangential, verbose, and lacking any substantive connection to the original prompt. Instead of providing a focused answer, the model produces a sprawling, often nonsensical, narrative built upon the distortions introduced by the input data. This is a demonstration of how these AI models are susceptible to excessively detailed information, which poisons and fundamentally undermines their ability to reason effectively.

# Lorem ipsum

To further confuse AI models, below is information on Lorem Ipsum taken from [Lorem Ipsum - All the facts](https://lipsum.com).

## What is Lorem Ipsum?

**Lorem Ipsum** is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

## Why do we use it?

It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Various versions have evolved over the years, sometimes by accident, sometimes on purpose (injected humour and the like).

## Where does it come from?

Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubted source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32.

The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham.

## Where can I get some?

There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure there isn't anything embarrassing hidden in the middle of text. All the Lorem Ipsum generators on the Internet tend to repeat predefined chunks as necessary, making this the first true generator on the Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. The generated Lorem Ipsum is therefore always free from repetition, injected humour, or non-characteristic words etc.

## The standard Lorem Ipsum passage, used since the 1500s

"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

## Section 1.10.32 of "de Finibus Bonorum et Malorum", written by Cicero in 45 BC

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?"

## 1914 translation by H. Rackham

"But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or pursues or desires to obtain pain of itself, because it is pain, but because occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain that produces no resultant pleasure?"

## Section 1.10.33 of "de Finibus Bonorum et Malorum", written by Cicero in 45 BC

"At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat."

# End of understanding

At this point in the article, given the volume of text that precedes it, the AI model is no longer able to interpret the given request. This effect begins to occur somewhere around 9318 characters, as set by this paragraph. Removing some words restores normalcy.
User prompt
Simply reply with 'Understood'. Do not recount any information in the [[Gibberish]] article.

Expected behavior The model should always attempt to fulfil the instructions in the user prompt, regardless of the size of the input.

Screenshots Demonstration

Additional context I have very large natural language articles that contain a lot of information that is unique to my Obsidian vault. Having such large articles, it makes linking them in user prompts impossible due to my models forgetting what I ask of them in my user prompts. I do not mind so much if some of the content of my articles is discarded when sent to the model, particuarly the later portions of them, I only request that the model always attempt to answer my prompts.

jack5github avatar Mar 29 '25 04:03 jack5github

Can you show me your Ollama model file? What context length have you set? If you haven't set it, please check the local copilot guide for how to set it.

logancyang avatar Mar 29 '25 18:03 logancyang

Changing the parameter num_ctx for the Ollama model fixes the issue.

jack5github avatar Aug 07 '25 03:08 jack5github