rushstack icon indicating copy to clipboard operation
rushstack copied to clipboard

[rush] Design proposal: balance the minimum number of tasks executed with the maximum level of parallel execution

Open L-Qun opened this issue 11 months ago • 3 comments

Summary

I believe Rush's task scheduling capabilities are excellent, but there are still some flaws that I think are intolerable. Specifically, it’s about how to balance executing the minimum number of tasks with achieving the maximum level of parallel execution

Details

Let's say our project's dependency relationships are as follows:

Image

Usually, on CI we need to run the build, lint, and test tasks. If we want to execute these three tasks in maximum parallel, we need to define them in command-line.json as follows:

{
  "commands": [
    {
      "name": "test",
      "commandKind": "phased",
      "phases": ["_phased:build", "_phased:lint", "_phased:test"],
      // ...
    }
  ]
}

On CI, we will run the command:

rush test --from git:origin/master

Thus, for the above project, when project B changes, we need to run the above command on projects A, B, E, C, D, and G.

Image

In the end, we need to execute 6 * 3 = 18 tasks. However, in this case, we don't need to run lint and test for A and E, right? At this point, we can split the above command-line.json into:

{
  "commands": [
    {
      "name": "build",
      "commandKind": "phased",
      "phases": ["build"],
      // ...
    },
    {
      "name": "test",
      "commandKind": "phased",
      "phases": ["_phased:lint", "_phased:test"],
      // ...
    }
  ]
}

On CI, we will execute the following commands separately:

1. rush build --from git:origin/master
2. rush test --impacted-by git:origin/master

At this point, we only need to execute 6 + 2 * 4 = 14 tasks, which means we don't need to run lint and test for A and E. However, splitting the entire execution process into two separate runs means we cannot maximize the parallel execution of all tasks.

Therefore, we need a way to both execute tasks in parallel and minimize the number of tasks executed.

So, back to the beginning, let's assume the dependencies of the _phase script are as follows:

"phases": [
  {
    "name": "_phase:lint",
    "dependencies": {
      "self": ["_phase:build"]
    },
    // ...
  },
  {
    "name": "_phase:test",
    "dependencies": {
      "self": ["_phase:build"]
    },
    // ...
  }
]

This means we need to execute the build before lint and test, so the command can now be simplified to:

rush test --impacted-by git:origin/master --include-phase-deps

In the background, Rush will execute --impacted-by in a safe manner, meaning it will execute the build of A and E as shown in the diagram above.

L-Qun avatar Feb 20 '25 08:02 L-Qun

@dmichon-msft

L-Qun avatar Feb 20 '25 09:02 L-Qun

Generally speaking, the model to date has been that we expect the unchanged tasks to replay from the build cache and therefore have a minimal impact on overall runtime.

dmichon-msft avatar Feb 21 '25 07:02 dmichon-msft

Generally speaking, the model to date has been that we expect the unchanged tasks to replay from the build cache and therefore have a minimal impact on overall runtime.

Yes, caching is a huge optimization technique, and at the same time, we also need a smarter scheduling strategy.

L-Qun avatar Feb 21 '25 08:02 L-Qun