guidance Is AzureOpenAI, Chat Completion, and Tools broken

The bug I get the following error when running the code below. Is it just not possible to use AOAI with chat completion and tools anymore? Thanks.

The model attempted to generate b"Thought 1: I should type the letter 'd'...." after the prompt b"...: 'dog'<|im_end|>\n<|im_start|>assistant\n", but that does not match the given grammar constraints! Since your model is a remote API that does not support full guidance integration we cannot force the model to follow the grammar, only flag an error when it fails to match. You can try to address this by improving the prompt, making your grammar more flexible, rerunning with a non-zero temperature, or using a model that supports full guidance grammar constraints.

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

""" Type letter evaluation test. """

import os
import time
from guidance import assistant, gen, guidance, models, system, user


def env_or_fail(var_name: str) -> str:
    """Get an environment variable or fail with a message."""
    env_value = os.getenv(var_name, None)
    assert env_value is not None, f"Env '{var_name}' not found"
    return env_value


azure_ai_endpoint = env_or_fail("AZURE_AI_CHAT_ENDPOINT")
azure_ai_key = env_or_fail("AZURE_AI_CHAT_KEY")
model = env_or_fail("AZURE_AI_CHAT_MODEL")

lm = models.AzureOpenAI(
    model=model,
    azure_endpoint=azure_ai_endpoint,
    api_key=azure_ai_key,
    echo=False,
    max_streaming_tokens=2000,
)
assert isinstance(lm, models.AzureOpenAI)

@guidance()
def type_letter(lm, letter):
    """Type a letter."""
    lm += letter
    return lm


tools = {"type_letter": "Returns a letter after it is typed."}
tool_map = {"type_letter": type_letter}

PROMPT = """Type the user message one letter at a time. You have access only to the following tools:

{tools}

Use the following format:

Message: the user message you must type
Thought 1: you should always think about what to do
Action 1: the action to take, has to be one of {tool_names}
Observation 1: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought N: I have typed each letter of the message.
Final Thought: the list of letters that were typed.
Done.

Example:
Message: cat
Thought 1: I should type the letter 'c'.
Action 1: type_letter('c')
Observation 1: 'c'
Thought 2: I should type the letter 'a'.
Action 2: type_letter('a')
Observation 2: 'a'
Thought 3: I should type the letter 't'.
Action 3: type_letter('t')
Observation 3: 't'
Thought 4: I have typed each letter of the message.
Final Thought: I typed the letters: 'c','a','t'
Done.
"""

I = 0
MAX = 1

while I < MAX:

    lm.engine.reset_metrics()

    with system():
        tool_names = list(tools.keys())
        lm += PROMPT.format(tools=tools, tool_names=tool_names)

    with user():
        # lm += "'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'"
        lm += "Message: 'dog'"

    with assistant():
        lm += gen("trace", max_tokens=2000, tools=[type_letter])

    print(str(lm))
    # print(lm["trace"])
    FINAL_THOUGHT = None
    for line in lm["trace"].split("\n"):
        if line.startswith("Final Thought"):
            FINAL_THOUGHT = line
            break
    print(FINAL_THOUGHT)
    # print(f"{lm.engine.metrics=}")

    I += 1
    time.sleep(60)

System info (please complete the following information):

OS Windows:
Guidance Version (guidance.__version__): 0.1.15

Jun 19 '24 05:06 arthurgreef

Hmmm.... this is an issue which shows up in our builds periodically. Does this fail every time for you?

Jun 19 '24 13:06 riedgar-ms

Yes it fails all the time.

Jun 20 '24 22:06 arthurgreef

I wonder if this is related to the break which has just popped up in the CI Tests. Investigating..... can confirm the repro at least.

Jun 21 '24 12:06 riedgar-ms

I have been prodding this a little more, and I don't think it's related to the other issues we're seeing.

Unfortunately, tool calling is a rather under-tested part of the code base. As in, we have exactly one test which makes use of tools, and that is not a 'chat' example.

I have found that if you change lm += letter to just print(letter) in the tool, it behaves exactly as expected. The tool dispatch mechanism itself is obviously working fine, but the update to the 'LLM state' is not.

@arthurgreef your report implies that this worked with a previous version of Guidance.... is that correct? If so, can you let me know which version?

Jun 21 '24 14:06 riedgar-ms

I apologize of my report indicated that this worked on a previous version. I have not used previous versions of guidance so I cannot confirm if this did or did not work.

Jun 23 '24 15:06 arthurgreef

No worries. I have some thoughts about what might be going wrong, but I'm not sure how to fix it yet.

Jun 23 '24 19:06 riedgar-ms