.Net: Bug: System/User role parsing fails when XML tags are included in `skprompt.txt` (even when using CDATA)

Open Millmer opened this issue 9 months ago • 0 comments

Describe the bug The XML prompt encoding in skprompt.txt does not work as expected when including additional XML tags or using CDATA. Specifically:

Using raw XML tags (e.g. <reasoning>) in the system prompt breaks the encoding, treating the entire prompt as a user prompt.
Using CDATA (e.g. <![CDATA[<answer></answer>]]>) causes the output to include the CDATA tags, which is not desired.
Pre-encoding the XML tags (e.g. using Unicode escapes) works, but is cumbersome.

To Reproduce

Create a skprompt.txt file with the following content:

<message role="system">
Think about the user's question before replying like an alien who is trying to understand the foreign planet you have landed on.
Wrap all your reasoning in <reasoning> tags.
</message>

<message role="user">
Would you like to watch a movie?
</message>

Observe in the logs that the prompt is rendered as a user message instead of preserving the system message role.
Replace the system prompt with:

<message role="system">
Do what the user says. Wrap your answer in <![CDATA[<answer></answer>]]> tags.
</message>

<message role="user">
Tell me about the Solar System.
</message>

Notice that the LLM output includes the CDATA tags.
Finally, try pre-encoding XML tags as in:

<message role="system">
Do what the user says. Wrap your answer in \u003canswer\u003e\u003c\u002fanswer\u003e tags.
</message>

<message role="user">
In 5 words describe the sky.
</message>

Confirm that this approach produces the desired output without the extra encoding, but it is cumbersome.

Expected behavior The prompt file should allow for including internal XML tags or CDATA sections without breaking the intended system/user roles or LLM output formatting. An improvement in documentation or parsing behaviour to handle these cases gracefully is expected.

Most LLMs have been trained to recognise XML tags specifically. This has been included in multiple prompt engineering guides such as those by OpenAI and Anthropic as a way to create more sophisticated prompts.

Screenshots N/A

Platform

Language: C#
Source: nuget v1.44.0
AI model: OpenAI:GPT-4o-mini
IDE: VS Code
OS: Mac

Additional context It appears that the current encoding approach either breaks the prompt structure (using raw XML tags) or does not remove CDATA wrappers in outputs. Improving how Semantic Kernel parses skprompt.txt to correctly handle special characters and CDATA blocks, or updating the documentation with supported practices, would be highly beneficial.

Personal logs from step 2

log
	trce: Microsoft.SemanticKernel.KernelFunctionFactory[0]
      Rendered prompt: <message role="system">
      Think about the user's question before replying like an alien who is trying to understand the foreign planet you have landed on. Wrap all your reasoning in <reasoning> tags.
      </message>
      
      <message role="user">
      Would you like to watch a movie?
      </message>
trce: Microsoft.SemanticKernel.Connectors.OpenAI.OpenAIChatCompletionService[0]
      ChatHistory: [{"Role":{"Label":"user"},"Items":[{"$type":"TextContent","Text":"\u003Cmessage role=\u0022system\u0022\u003E\nThink about the user\u0027s question before replying like an alien who is trying to understand the foreign planet you have landed on. Wrap all your reasoning in \u003Creasoning\u003E tags.\n\u003C/message\u003E\n\n\u003Cmessage role=\u0022user\u0022\u003E\nWould you like to watch a movie?\n\u003C/message\u003E"}]}], Settings: {"service_id":"openai","model_id":null,"function_choice_behavior":null,"temperature":1.0,"max_tokens":4096}
info: Microsoft.SemanticKernel.Connectors.OpenAI.OpenAIChatCompletionService[0]
      Prompt tokens: 67. Completion tokens: 156. Total tokens: 223.
info: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test succeeded.
trce: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test result: <reasoning>As an alien, my understanding of "watching a movie" is limited, yet it appears to be an activity involving visual and auditory stimulation, possibly for entertainment, storytelling, or cultural reflection. In my exploration of this foreign planet, I have observed that humans often congregate to engage in such activities, indicating it may hold social significance. I recognize that participating in this human custom could provide insight into their emotions, narratives, and shared experiences. However, I must also consider the mechanisms humans use for this activity and whether it aligns with my own sensory perception. Thus, I find the proposition intriguing, though I am uncertain of the experience’s translation to my understanding.</reasoning> 
      
      I find the concept fascinating! What type of movie do humans prefer?
info: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test completed. Duration: 5.1441888s

Personal logs for step 4

log
	trce: Microsoft.SemanticKernel.KernelFunctionFactory[0]
      Rendered prompt: <message role="system">
      Do what the user says. Wrap your answer in <![CDATA[<answer></answer>]]> tags.
      </message>
      
      <message role="user">
      Tell me about the Solar System.
      </message>
trce: Microsoft.SemanticKernel.Connectors.OpenAI.OpenAIChatCompletionService[0]
      ChatHistory: [{"Role":{"Label":"system"},"Items":[{"$type":"TextContent","Text":"Do what the user says. Wrap your answer in \u003C![CDATA[\u003Canswer\u003E\u003C/answer\u003E]]\u003E tags."}]},{"Role":{"Label":"user"},"Items":[{"$type":"TextContent","Text":"Tell me about the Solar System."}]}], Settings: {"service_id":"openai","model_id":null,"function_choice_behavior":null,"temperature":1.0,"max_tokens":4096}
info: Microsoft.SemanticKernel.Connectors.OpenAI.OpenAIChatCompletionService[0]
      Prompt tokens: 40. Completion tokens: 592. Total tokens: 632.
info: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test succeeded.
trce: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test result: <![CDATA[<answer>
      The Solar System is a gravitationally bound system comprising the Sun and the objects that orbit it. This includes eight major planets, their moons, dwarf planets, asteroids, comets, and other celestial bodies. It formed about 4.6 billion years ago from the gravitational collapse of a region within a large molecular cloud. Its vast complexities and dynamics continue to be subjects of extensive study and exploration.
      </answer>]]>
info: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test completed. Duration: 9.9881755s

Personal logs for step 6

log
trce: Microsoft.SemanticKernel.KernelFunctionFactory[0]
      Rendered prompt: <message role="system">
      Do what the user says. Wrap your answer in \u003canswer\u003e\u003c\u002fanswer\u003e tags.
      </message>
      
      <message role="user">
      In 5 words describe the sky.
      </message>
trce: Microsoft.SemanticKernel.Connectors.OpenAI.OpenAIChatCompletionService[0]
      ChatHistory: [{"Role":{"Label":"system"},"Items":[{"$type":"TextContent","Text":"Do what the user says. Wrap your answer in \\u003canswer\\u003e\\u003c\\u002fanswer\\u003e tags."}]},{"Role":{"Label":"user"},"Items":[{"$type":"TextContent","Text":"In 5 words describe the sky."}]}], Settings: {"service_id":"openai","model_id":null,"function_choice_behavior":null,"temperature":1.0,"max_tokens":4096}
info: Microsoft.SemanticKernel.Connectors.OpenAI.OpenAIChatCompletionService[0]
      Prompt tokens: 49. Completion tokens: 18. Total tokens: 67.
info: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test succeeded.
trce: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test result: <answer>Vast, blue, shifting, cloud-filled, beautiful.</answer>
info: Microsoft.SemanticKernel.KernelFunction[0]
      Function Plugin-Test completed. Duration: 1.7699015s

Apr 11 '25 20:04 Millmer