dify openai_api_compatible API request failed with status code 422

Self Checks

[x] This is only for bug report, if you would like to ask a question, please head to Discussions.
[x] I have searched for existing issues search for existing issues, including closed ones.
[x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
[x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.1.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

I have been using dify since 0.15.3, using third-party services that are compatible with the OpenAI-compliant interface standards, and everything worked fine at first. After v1.0.1 v1.1.0 v1.1.1 is upgraded, the interface starts to show 422 exception. I have tested that openai_api_compatible for 0.15.3 and 0.15.4 is still working. I started DEBUG, and the following is my log. Through the log, I found that under version 0.15.4, a third party was directly called, while under version 1.1.1, the request was sent to plugin_daemon for a layer of forwarding. After forwarding, the content received was empty, and an error was directly reported when the request was initiated again。

v0.15.4

v1.1.1

[on_llm_before_invoke]
Model: DeepSeek-R1
Parameters:
Stream: True
User: 289e1211-f8d6-496a-8de2-f98c80da38a2
Prompt messages:
        role: user
        content: 你好
        role: assistant
        content: 
        role: user
        content: 你叫什么名字

[on_llm_new_chunk]2025-03-25 06:36:06.770 DEBUG [Thread-456 (_generate_worker)] [connectionpool.py:241] - Starting new HTTP connection (1): plugin_daemon:5002
2025-03-25 06:36:06.880 DEBUG [Thread-456 (_generate_worker)] [connectionpool.py:544] - http://plugin_daemon:5002 "POST /plugin/8b94a720-0a7f-4ad7-983b-4507c7775920/dispatch/llm/invoke HTTP/1.1" 200 None
2025-03-25 06:36:06.882 ERROR [Thread-456 (_generate_worker)] [app_generator.py:243] - Error when generating
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 237, in _invoke_result_generator
    for chunk in result:
                 ^^^^^^
  File "/app/api/core/plugin/manager/model.py", line 189, in invoke_llm
    yield from response
  File "/app/api/core/plugin/manager/base.py", line 189, in _request_with_plugin_daemon_response_stream
    self._handle_plugin_daemon_error(error.error_type, error.message)
  File "/app/api/core/plugin/manager/base.py", line 221, in _handle_plugin_daemon_error
    raise PluginInvokeError(description=message)
core.plugin.manager.exc.PluginInvokeError: PluginInvokeError: {"args":{"description":"[models] Error: API request failed with status code 422: {\"errno\":422,\"error\":\"model inference failure\",\"data\":null}"},"error_type":"InvokeError","message":"[models] Error: API request failed with status code 422: {\"errno\":422,\"error\":\"model inference failure\",\"data\":null}"}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/app/apps/chat/app_generator.py", line 226, in _generate_worker
    runner.run(
  File "/app/api/core/app/apps/chat/app_runner.py", line 231, in run
    self._handle_invoke_result(
  File "/app/api/core/app/apps/base_app_runner.py", line 278, in _handle_invoke_result
    self._handle_invoke_result_stream(invoke_result=invoke_result, queue_manager=queue_manager, agent=agent)
  File "/app/api/core/app/apps/base_app_runner.py", line 313, in _handle_invoke_result_stream
    for result in invoke_result:
                  ^^^^^^^^^^^^^
  File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 261, in _invoke_result_generator
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeError: [openai_api_compatible] Error: PluginInvokeError: {"args":{"description":"[models] Error: API request failed with status code 422: {\"errno\":422,\"error\":\"model inference failure\",\"data\":null}"},"error_type":"InvokeError","message":"[models] Error: API request failed with status code 422: {\"errno\":422,\"error\":\"model inference failure\",\"data\":null}"}
2025-03-25 06:36:06.886 DEBUG [Thread-455 (process_request_thread)] [based_generate_task_pipeline.py:45] - error: [openai_api_compatible] Error: PluginInvokeError: {"args":{"description":"[models] Error: API request failed with status code 422: {\"errno\":422,\"error\":\"model inference failure\",\"data\":null}"},"error_type":"InvokeError","message":"[models] Error: API request failed with status code 422: {\"errno\":422,\"error\":\"model inference failure\",\"data\":null}"}
2025-03-25 06:36:06.895 INFO [Thread-455 (process_request_thread)] [_internal.py:97] - 172.31.0.10 - - [25/Mar/2025 06:36:06] "POST /console/api/apps/0b1df6de-b5c5-4a6a-8c1b-29fb438111bb/chat-messages HTTP/1.1" 200 -

✔️ Expected Behavior

What should I do next to solve this problem, I have tested the native interface through the linux command line and it is also normal

❌ Actual Behavior

No response

Mar 25 '25 06:03 xiaoyunjie

The issue you're experiencing with the 422 status code in Dify version 1.1.1 seems to be related to the plugin_daemon component, which is now involved in forwarding requests. In version 0.15.4, requests were directly sent to third-party services, but in version 1.1.1, they are routed through the plugin_daemon, which might be causing the problem.

The 422 status code indicates a "model inference failure," and the logs show that the response from the plugin_daemon is empty, leading to a PluginInvokeError. This could be due to several reasons, such as:

Configuration Issues: Ensure that the plugin_daemon is correctly configured and that all necessary environment variables and settings are properly set.
Compatibility: Verify that the third-party service you are using is fully compatible with the changes introduced in the newer Dify versions.
Error Handling: The error handling mechanism in the plugin_daemon might be different in the newer version, which could affect how errors are propagated and handled.

To troubleshoot further, you can:

Check the configuration and logs of the plugin_daemon for any additional error messages or warnings that might provide more context.
Compare the request and response payloads between the working version (0.15.4) and the failing version (1.1.1) to identify any discrepancies.
Review any changes in the Dify codebase related to the plugin_daemon and request handling between these versions, as these might provide insights into the root cause of the issue.

If the problem persists, consider isolating the plugin_daemon component and testing it independently to ensure it functions as expected.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

Mar 25 '25 06:03 dosubot[bot]

yea we got the same issue with an openAI API compatible reranker which was working fine on 0.5.13. The Embedder running on the same server could be added without issue.

So after some investigation i found out dify changed the request to the reranker the texts seems to be changed to documents and this throws off https://github.com/huggingface/text-embeddings-inference which we are using. For now the workaround is to create a proxy which will fix the JSON so it works again.

{"model": "BAAI/bge-reranker-v2-m3", "query": "What is the capital of the United States?", "texts": ["Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.", "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan."], "top_n": 3, "return_documents": true}

so after setting up the proxy and plainly redirecting the response to Dify i get list indices must be integers or slices, not str i guess we have to rollback to 0.5.13

Mar 27 '25 11:03 Baconiser

yea we got the same issue with an openAI API compatible reranker which was working fine on 0.5.13. The Embedder running on the same server could be added without issue.

So after some investigation i found out dify changed the request to the reranker the texts seems to be changed to documents and this throws off https://github.com/huggingface/text-embeddings-inference which we are using. For now the workaround is to create a proxy which will fix the JSON so it works again.

{"model": "BAAI/bge-reranker-v2-m3", "query": "What is the capital of the United States?", "texts": ["Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.", "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan."], "top_n": 3, "return_documents": true}

so after setting up the proxy and plainly redirecting the response to Dify i get list indices must be integers or slices, not str i guess we have to rollback to 0.5.13

You can fix it in your docker volumes ，find the file \volumes\plugin_daemon\cwd\langgenius\openai_api_compatible- .................. \site-packages\dify_plugin\interfaces\model\ openai_compatible\rerank.py

instead documents with texts, then you don't need any proxy 🙂
I use mindIE and my response is like [{"index",0,"score":0.123}], so I modify some code and then it worked

my code:

` data = { "model": model_name, "query": query, "texts": docs, "top_n": top_n, "return_documents": True, }

    try:
        response = post(str(URL(url) / "rerank"), headers=headers, data=dumps(data), timeout=60)
        response.raise_for_status()
        results = response.json()

        rerank_documents = []
        scores = [result["score"] for result in results]

        # Min-Max Normalization: Normalize scores to 0 ~ 1.0 range
        min_score = min(scores)
        max_score = max(scores)
        score_range = max_score - min_score if max_score != min_score else 1.0  # Avoid division by zero

        for result in results:
            index = result["index"]

            # Retrieve document text (fallback if llama.cpp rerank doesn't return it)
            text = docs[index]
            document = result.get("document", {})
            if document:
                if isinstance(document, dict):
                    text = document.get("text", docs[index])
                elif isinstance(document, str):
                    text = document

            # Normalize the score
            normalized_score = (result["score"] - min_score) / score_range

            # Create RerankDocument object with normalized score
            rerank_document = RerankDocument(
                index=index,
                text=text,
                score=normalized_score,
            )

            # Apply threshold (if defined)
            if score_threshold is None or normalized_score >= score_threshold:
                rerank_documents.append(rerank_document)

`

here is the ori code: `data = { "model": model_name, "query": query, "documents": docs, "top_n": top_n, "return_documents": True, }

    try:
        response = post(str(URL(url) / "rerank"), headers=headers, data=dumps(data), timeout=60)
        response.raise_for_status()
        results = response.json()

        rerank_documents = []
        scores = [result["relevance_score"] for result in results["results"]]

        # Min-Max Normalization: Normalize scores to 0 ~ 1.0 range
        min_score = min(scores)
        max_score = max(scores)
        score_range = max_score - min_score if max_score != min_score else 1.0  # Avoid division by zero

        for result in results["results"]:
            index = result["index"]

            # Retrieve document text (fallback if llama.cpp rerank doesn't return it)
            text = docs[index]
            document = result.get("document", {})
            if document:
                if isinstance(document, dict):
                    text = document.get("text", docs[index])
                elif isinstance(document, str):
                    text = document

            # Normalize the score
            normalized_score = (result["relevance_score"] - min_score) / score_range`

Apr 03 '25 07:04 zhangchievil

Thanks for that reply, i cant modify the dify code in my case. But it helped me to modify my proxy enough to get it working. If someone is interested:

import { serve } from "https://deno.land/[email protected]/http/server.ts";

const targetUrl = Deno.env.get("TARGET_URL");

if (!targetUrl) {
  console.error("TARGET_URL environment variable is not set");
  Deno.exit(1);
}

async function handler(req) {
  if (req.method === "POST") {
    const body = await req.json();
    const usedDocumentsKey = "documents" in body;

    if (usedDocumentsKey) {
      body.texts = body.documents;
      delete body.documents;
    }

    const response = await fetch(targetUrl, {
      method: "POST",
      headers: {
        ...Object.fromEntries(req.headers.entries()),
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });

    if (usedDocumentsKey) {
      const data = await response.json();
      const transformed = {
        results: data.map((e) => ({
          index: e.index,
          relevance_score: e.score,
          document: {
            text: body.texts?.[e.index] ?? ""
          }
        })),
      };

      return new Response(JSON.stringify(transformed), {
        status: 200,
        headers: {
          "Content-Type": "application/json",
        },
      });
    }

    return new Response(response.body, {
      status: response.status,
      headers: {
        "Content-Type": response.headers.get("Content-Type") || "application/json",
      },
    });
  }

  return new Response("Method Not Allowed", { status: 405 });
}

serve(handler);

Apr 07 '25 20:04 Baconiser