spark [SPARK-48396] Support configuring max cores can be used for SQL

What changes were proposed in this pull request?

A new session level config spark.sql.execution.coresLimitNumber to configure the maximum number of cores can be used by SQL.

Why are the changes needed?

When there is a long-running shared Spark SQL cluster, there always be a situation where a large SQL occupies all the cores of the cluster, affecting the execution of other SQLs. Therefore, it is hoped that there is a configuration that can limit the maximum cores used by SQL.

Does this PR introduce any user-facing change?

yes, add new config spark.sql.execution.coresLimitNumber.

How was this patch tested?

I will add test unit later

Was this patch authored or co-authored using generative AI tooling?

No

May 23 '24 06:05 yabola

@mridulm @Ngone51 Could you help review this, thank you~ This is useful in a shared sql cluster. This will make it easier to control sql. The picture below shows that cores used is consistent after set spark.sql.execution.coresLimitNumber.

May 24 '24 09:05 yabola

I would like to describe the usage scenario: In a scenario where multiple users are sharing 2048 core long running SQL cluster. Some users may have non-standard queries that use a large number of cores, causing other users' smaller sql to get stuck. By globally limiting SQL to 500 core, it can help reduce the occurrence of starvation.

Jul 17 '24 03:07 yabola

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Oct 26 '24 00:10 github-actions[bot]