dify llm node has been in Running state for a long time and cannot be terminated

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Cloud, Self Hosted (Docker)

Steps to reproduce

llm node has been in Running state for a long time and cannot be terminated。 Sometimes the large model nodes do not respond and remain in a running state for a long time, and no exceptions or errors are found in the backend logs. Sometimes llm node has already returned a response, but the node status is still running.

✔️ Expected Behavior

The status of LLM is either ending normally or indicating an error.

❌ Actual Behavior

llm node has been in Running state for a long time

Sep 25 '24 06:09 deific

The LLM node in Dify version 0.8.3 may remain in the "Running" state for a long time and not terminate due to several reasons:

Context Fetching: The _fetch_context method involves a generator that yields RunRetrieverResourceEvent events. If the context fetching process is slow or if there is an issue with the context data, it could cause the node to remain in the running state.
Model Invocation: The _invoke_llm method involves invoking the large language model and handling the result through a generator. If the model invocation takes a long time or if there is an issue with the streaming of results, it could cause the node to remain in the running state.
Event Handling: The _handle_invoke_result method processes the results from the model invocation. If there is a delay or issue in processing the streamed chunks or final results, it could cause the node to remain in the running state.

Here is the relevant code that shows these processes:

def _fetch_context(self, node_data: LLMNodeData, variable_pool: VariablePool) -> Generator[RunEvent, None, None]:
    if not node_data.context.enabled:
        return

    if not node_data.context.variable_selector:
        return

    context_value = variable_pool.get_any(node_data.context.variable_selector)
    if context_value:
        if isinstance(context_value, str):
            yield RunRetrieverResourceEvent(retriever_resources=[], context=context_value)
        elif isinstance(context_value, list):
            context_str = ""
            original_retriever_resource = []
            for item in context_value:
                if isinstance(item, str):
                    context_str += item + "\n"
                else:
                    if "content" not in item:
                        raise ValueError(f"Invalid context structure: {item}")

                    context_str += item["content"] + "\n"

                    retriever_resource = self._convert_to_original_retriever_resource(item)
                    if retriever_resource:
                        original_retriever_resource.append(retriever_resource)

            yield RunRetrieverResourceEvent(
                retriever_resources=original_retriever_resource, context=context_str.strip()
            )

def _invoke_llm(
    self,
    node_data_model: ModelConfig,
    model_instance: ModelInstance,
    prompt_messages: list[PromptMessage],
    stop: Optional[list[str]] = None,
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    db.session.close()

    invoke_result = model_instance.invoke_llm(
        prompt_messages=prompt_messages,
        model_parameters=node_data_model.completion_params,
        stop=stop,
        stream=True,
        user=self.user_id,
    )

    generator = self._handle_invoke_result(invoke_result=invoke_result)

    usage = LLMUsage.empty_usage()
    for event in generator:
        yield event
        if isinstance(event, ModelInvokeCompleted):
            usage = event.usage

    self.deduct_llm_quota(tenant_id=self.tenant_id, model_instance=model_instance, usage=usage)

def _handle_invoke_result(
    self, invoke_result: LLMResult | Generator
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    if isinstance(invoke_result, LLMResult):
        return

    model = None
    prompt_messages: list[PromptMessage] = []
    full_text = ""
    usage = None
    finish_reason = None
    for result in invoke_result:
        text = result.delta.message.content
        full_text += text

        yield RunStreamChunkEvent(chunk_content=text, from_variable_selector=[self.node_id, "text"])

        if not model:
            model = result.model

        if not prompt_messages:
            prompt_messages = result.prompt_messages

        if not usage and result.delta.usage:
            usage = result.delta.usage

        if not finish_reason and result.delta.finish_reason:
            finish_reason = result.delta.finish_reason

    if not usage:
        usage = LLMUsage.empty_usage()

    yield ModelInvokeCompleted(text=full_text, usage=usage, finish_reason=finish_reason)

These methods involve generators and event handling that could potentially cause delays or issues leading to the node remaining in the "Running" state ^[1].

To continue talking to Dosu, mention @dosu.

Sep 25 '24 07:09 dosubot[bot]

Dify 0.8.3 版本中的 LLM 节点可能会由于以下几个原因长时间处于“正在运行”状态而不会终止：

上下文获取：该_fetch_context方法涉及一个产生RunRetrieverResourceEvent事件的生成器。如果上下文获取过程缓慢或上下文数据存在问题，则可能导致节点保持运行状态。
模型调用：该_invoke_llm方法涉及调用大型语言模型并通过生成器处理结果。如果模型调用需要很长时间或结果流式传输出现问题，则可能导致节点保持运行状态。
事件处理：该_handle_invoke_result方法处理模型调用的结果。如果在处理流式块或最终结果时出现延迟或问题，则可能导致节点保持运行状态。

以下是显示这些过程的相关代码：

def _fetch_context(self, node_data: LLMNodeData, variable_pool: VariablePool) -> Generator[RunEvent, None, None]:
    if not node_data.context.enabled:
        return

    if not node_data.context.variable_selector:
        return

    context_value = variable_pool.get_any(node_data.context.variable_selector)
    if context_value:
        if isinstance(context_value, str):
            yield RunRetrieverResourceEvent(retriever_resources=[], context=context_value)
        elif isinstance(context_value, list):
            context_str = ""
            original_retriever_resource = []
            for item in context_value:
                if isinstance(item, str):
                    context_str += item + "\n"
                else:
                    if "content" not in item:
                        raise ValueError(f"Invalid context structure: {item}")

                    context_str += item["content"] + "\n"

                    retriever_resource = self._convert_to_original_retriever_resource(item)
                    if retriever_resource:
                        original_retriever_resource.append(retriever_resource)

            yield RunRetrieverResourceEvent(
                retriever_resources=original_retriever_resource, context=context_str.strip()
            )

def _invoke_llm(
    self,
    node_data_model: ModelConfig,
    model_instance: ModelInstance,
    prompt_messages: list[PromptMessage],
    stop: Optional[list[str]] = None,
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    db.session.close()

    invoke_result = model_instance.invoke_llm(
        prompt_messages=prompt_messages,
        model_parameters=node_data_model.completion_params,
        stop=stop,
        stream=True,
        user=self.user_id,
    )

    generator = self._handle_invoke_result(invoke_result=invoke_result)

    usage = LLMUsage.empty_usage()
    for event in generator:
        yield event
        if isinstance(event, ModelInvokeCompleted):
            usage = event.usage

    self.deduct_llm_quota(tenant_id=self.tenant_id, model_instance=model_instance, usage=usage)

def _handle_invoke_result(
    self, invoke_result: LLMResult | Generator
) -> Generator[RunEvent | ModelInvokeCompleted, None, None]:
    if isinstance(invoke_result, LLMResult):
        return

    model = None
    prompt_messages: list[PromptMessage] = []
    full_text = ""
    usage = None
    finish_reason = None
    for result in invoke_result:
        text = result.delta.message.content
        full_text += text

        yield RunStreamChunkEvent(chunk_content=text, from_variable_selector=[self.node_id, "text"])

        if not model:
            model = result.model

        if not prompt_messages:
            prompt_messages = result.prompt_messages

        if not usage and result.delta.usage:
            usage = result.delta.usage

        if not finish_reason and result.delta.finish_reason:
            finish_reason = result.delta.finish_reason

    if not usage:
        usage = LLMUsage.empty_usage()

    yield ModelInvokeCompleted(text=full_text, usage=usage, finish_reason=finish_reason)

这些方法涉及生成器和事件处理，可能会导致延迟或问题，导致节点保持在“运行”状态[1]。

要继续与Dosu交谈，请提及@dosu。

仍然存在应该如何处理

Dec 25 '24 06:12 zeming-coder