.Net: Bug: Custom PromptExecutionSettings with "guided_regex" property for vLLM OpenAI Api Server is not working

Open dprokhorov17 opened this issue 9 months ago • 1 comments

Describe the bug It seems that a custom property is not being serialized into the final http request for the vllm openai api server

Here is my code (snippet) so far:


string expectedResultRegex =
            """```json\s*\{\s*"expectation_met":\s*(true|false),\s*"bbox_2d":\s*\[(\d+),\s*(\d+),\s*(\d+),\s*(\d+)\]\s*\}\s*```""";

        byte[] imageBytes = File.ReadAllBytes(imagePath);

        var qwenLocalHttpClient = new HttpClient();
        qwenLocalHttpClient.BaseAddress = new Uri("http://localhost:5011/v1");

IChatCompletionService chatCompletionService = new OpenAIChatCompletionService(
            modelId: "Qwen/Qwen2.5-VL-32B-Instruct",
            apiKey: "EMPTY",
            httpClient: qwenLocalHttpClient
        );

        var prompt = """
                     Determine if:
                     1. The UI state matches the user's expectation
                     2. Identify the relevant UI element
                      
                     Return in JSON format:
                     ```json
                     {
                         "expectation_met": true/false,
                         "bbox_2d": [x1, y1, x2, y2]
                     }
                     ```
                      
                     If element not visible or expectation can't be evaluated, indicate clearly with bbox_2d as [0, 0, 0, 0]. Output only the required JSON format!
                     """;

        var chatHistory = new ChatHistory();
        chatHistory.AddUserMessage([
            new TextContent(prompt),
            new ImageContent(imageBytes, "image/png")
        ]);


        var reply = await chatCompletionService.GetChatMessageContentAsync(chatHistory,
            new VllmCustomExecutionSettings
            {
                Temperature = 0,
                GuidedRegex = expectedResultRegex,
            });

        Console.WriteLine(reply);

and the custom class for prompt execution:

public sealed class VllmCustomExecutionSettings : OpenAIPromptExecutionSettings
    {
        [JsonPropertyName("guided_regex")]
        public string GuidedRegex { get; set; }
    }

On my docker container log I don't see the guided_regex decoding applied correctly:

Received request chatcmpl-e211e747fb98442b81a2e056a6b5911a: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nDetermine if:\r\n1. The UI state matches the user\'s expectation\r\n2. Identify the relevant UI element\r\n \r\nReturn in JSON format:\r\n```json\r\n{\r\n "expectation_met": true/false,\r\n "bbox_2d": [x1, y1, x2, y2]\r\n}\r\n```\r\n \r\nIf element not visible or expectation can\'t be evaluated, indicate clearly with bbox_2d as [0, 0, 0, 0]. Output only the required JSON format!<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=9874, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.

This is how it should look (using openai's python package:

Received request chatcmpl-77f0db8eb863499f95b755d8e034b75c: prompt: "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\nDetermine if, he UI state matches the user's expectation\nReturn in JSON format with bbox_2d key.\n<|vision_start|><|image_pad|><|vision_end|><|im_end|>\n<|im_start|>assistant\n", params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.05, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=9954, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json=None, regex='```json\\s*\\{\\s*\\s*"expectation_met":\\s*(true|false),\\s*"bbox_2d":\\s*\\[(\\d+),\\s*(\\d+),\\s*(\\d+),\\s*(\\d+)\\]\\s*\\}\\s*```', choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None), extra_args=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None. INFO 05-08 02:51:12 [async_llm.py:228] Added request chatcmpl-77f0db8eb863499f95b755d8e034b75c.

Platform

Language: C#

May 08 '25 09:05 dprokhorov17

This also doesn't work with Kernel Builder:

var kernel = Kernel
            .CreateBuilder()
            .AddOpenAIChatCompletion(modelId: "Qwen/Qwen2.5-VL-32B-Instruct",
                apiKey: "EMPTY",
                httpClient: qwenLocalHttpClient)
            .Build();

        var semanticFunction = kernel.CreateFunctionFromPrompt(prompt);

        var executionSettings = new VllmCustomExecutionSettings
        {
            Temperature = 0,
            GuidedRegex = expectedResultRegex,
        };

        var kernelArguments = new KernelArguments(executionSettings)
        {
            ["input"] = prompt
        };

        FunctionResult result = await kernel.InvokeAsync(semanticFunction, kernelArguments);
        
        Console.WriteLine(result);

May 08 '25 11:05 dprokhorov17