dolphinscheduler icon indicating copy to clipboard operation
dolphinscheduler copied to clipboard

[Improvement-17157][Master] Support setting max.concurrent.workflow.instances

Open tusaryan opened this issue 9 months ago • 8 comments

Purpose of the pull request

This Pull Request implements a new configuration option max.concurrent.workflow.instances in the master server, allowing administrators to set a limit on the number of concurrently running workflow instances on a single master. This addresses the issue of potential overload when multiple workflows failover to a single master, as described in #17157.

close #17157

Brief change log

  • Feature: Implemented a new configuration property, master.server-load-protection.max-concurrent-workflow-instances, to limit concurrent workflows. The master is marked as busy if the number of running workflows is greater than or equal to this value.

  • Enhancement: Refactored MasterServerLoadProtection to incorporate system resource thresholds based on community feedback. The following configuration properties are now utilized to provide a more robust server protection mechanism:

    • maxSystemCpuUsagePercentageThresholds
    • maxJvmCpuUsagePercentageThresholds
    • maxSystemMemoryUsagePercentageThresholds
    • maxDiskUsagePercentageThresholds
  • Bug Fix: Resolved a NullPointerException that occurred during MasterServer startup by correctly using @Autowired to inject the MasterServerLoadProtection bean into MasterConfig.

  • Refactor:

    • Created MasterServerLoadProtectionConfig to manage the bean's lifecycle and properties via Spring.
    • Replaced the use of WorkflowCacheRepository with the IWorkflowRepository interface for better abstraction and testability.
  • Documentation: Updated the application.yml file and the configuration.md contributor documentation to include all new load protection options with clear descriptions. Example:

    master:
      server-load-protection:
        # Master max concurrent workflow instances. When the count reaches or exceeds this value, the master is marked busy.
        max-concurrent-workflow-instances: 100  #Set your desired limit 
    

Replace 100 with any integer value you want as the limit.

Verify this pull request

This change is covered by updated unit tests in MasterServerLoadProtectionTest.java to include comprehensive test cases verifying the new functionality for both the workflow instance limit and the system resource thresholds:

  • Workflow Limit Test: A dedicated test (isOverloadWithMaxConcurrentWorkflowInstances) verifies the new limit:
    • When workflow count (e.g., 5) is less than the limit (e.g., 10), the master is not overloaded.
    • When workflow count (e.g., 5) is equal to or greater than the limit (e.g., 5 or 3), the master is overloaded.
  • System Metrics Test: Tests have been updated to confirm that the master is correctly marked as overloaded when any of the system resource thresholds (CPU, memory, disk) are exceeded.

fix issue: [Improvement][Master] Support set max.concurrent.workflow.instances in master #17157

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

tusaryan avatar May 01 '25 14:05 tusaryan

Can anyone review the changes again

tusaryan avatar May 07 '25 21:05 tusaryan

Please solve the code check style by mvn spotless:apply

ruanwenjun avatar May 10 '25 04:05 ruanwenjun

I wanted to let you know that I'm currently traveling. Because of this, I might be a bit delayed in addressing the feedback on this pull request. I will definitely get to it as soon as things settle down for me.

tusaryan avatar May 10 '25 08:05 tusaryan

Please solve the code check style by mvn spotless:apply

Code style issues have been fixed by running mvn spotless:apply. Please let me know if there's anything else I can do!

tusaryan avatar May 16 '25 06:05 tusaryan

I've integrated system metrics into MasterServerLoadProtection and updated its configuration documentation. Could someone please review the changes?

tusaryan avatar Jun 17 '25 17:06 tusaryan

Please check the failed CI @tusaryan

SbloodyS avatar Jun 18 '25 07:06 SbloodyS

Please check the failed CI @tusaryan

Okay, I'll check it out.

tusaryan avatar Jun 18 '25 12:06 tusaryan

I've reviewed the logs, and the "Backend-Build" error appears unrelated to my changes. Could someone please help me with this?

tusaryan avatar Jun 27 '25 18:06 tusaryan

Please retry analysis of this Pull-Request directly on SonarQube Cloud

sonarqubecloud[bot] avatar Jul 02 '25 02:07 sonarqubecloud[bot]