[SPARK-37349][SQL] add SQL Rest API parsing logic
What changes were proposed in this pull request?
Following up on https://issues.apache.org/jira/browse/SPARK-31440, values like
"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 59.0 B (stage 1.0: task 5))" are currently shown from Rest API calls which are not easily digested in its current form.New processing logic of the values is introduced along with the creation of the following class in the SQL Rest API to organize the metric values:
case class Value private[spark] (stageId: Option[String] = None, taskId: Option[String] = None,
amount: Option[String] = None, min: Option[String] = None,
med: Option[String] = None, max: Option[String] = None)
Which after processing, would make the output look like
{ "value" : { "stageId" : "1.0", "taskId" : "5", "amount" : "177.0 B", "min" : "59.0 B", "med" : "59.0 B", "max" : "59.0 B" }
Currently not in the PR but could be added if there is interest is the normalization of metrics for aggregation purposes such as the following:
- The conversion of hour, minute and second time units to milliseconds.
- PB,TB, GB, MB, KB units are converted to Bytes.
- Comma is removed from Comma formatted Long values (e.g: 8389632)
Why are the changes needed?
To organize and process new metric fields in a more user friendly manner.
Does this PR introduce any user-facing change?
Yes, see output below which are gathered from Check Sql Rest Api Endpoints Unit Test in SqlResourceWithActualMetricsSuite.scala with AQE set to true.
Before Changes:
BeforeSpark37349UT.txt
After changes:
AfterSpark37349UT.txt
Backward Compatibility:
API Changes were made in sql/core/src/main/scala/org/apache/spark/status/api/v1/sql/api.scala.
How was this patch tested?
Added new Unit Test, manual testing locally.
Test build #145354 has finished for PR 34637 at commit 8e79198.
- This patch fails MiMa tests.
- This patch merges cleanly.
- This patch adds no public classes.
Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49825/
Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49825/
Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49894/
Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49894/
Test build #145421 has finished for PR 34637 at commit 0e068d0.
- This patch passes all tests.
- This patch merges cleanly.
- This patch adds no public classes.
Test build #145463 has finished for PR 34637 at commit eb5e83e.
- This patch fails Scala style tests.
- This patch merges cleanly.
- This patch adds no public classes.
Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49935/
Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49935/
Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49937/
Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49937/
Test build #145465 has finished for PR 34637 at commit 141a16f.
- This patch passes all tests.
- This patch merges cleanly.
- This patch adds no public classes.
cc @gengliangwang
cc @gengliangwang would this feature be of interest?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
@tgravescs is there any interest in this feature?