[KYUUBI #3406] [Subtask] [Doc] Add PySpark client docs
Why are the changes needed?
close #3406.
Add PySpark client docs.
How was this patch tested?
-
[ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
-
[ ] Add screenshots for manual tests if appropriate
-
[ ] Run test locally before make a pull request
Codecov Report
Merging #3407 (fb0cfcd) into master (b3ecaef) will increase coverage by
0.14%. The diff coverage isn/a.
:exclamation: Current head fb0cfcd differs from pull request most recent head a181a5b. Consider uploading reports for the commit a181a5b to get more accurate results
@@ Coverage Diff @@
## master #3407 +/- ##
============================================
+ Coverage 51.50% 51.65% +0.14%
Complexity 13 13
============================================
Files 480 482 +2
Lines 26664 26933 +269
Branches 3728 3760 +32
============================================
+ Hits 13733 13911 +178
- Misses 11591 11664 +73
- Partials 1340 1358 +18
| Impacted Files | Coverage Δ | |
|---|---|---|
| .../org/apache/kyuubi/ha/client/DiscoveryClient.scala | 36.36% <0.00%> (-22.73%) |
:arrow_down: |
| ...ache/kyuubi/engine/flink/FlinkProcessBuilder.scala | 73.77% <0.00%> (-14.76%) |
:arrow_down: |
| ...kyuubi/engine/spark/session/SparkSessionImpl.scala | 70.58% <0.00%> (-13.42%) |
:arrow_down: |
| ...e/kyuubi/engine/spark/operation/ExecuteScala.scala | 73.17% <0.00%> (-11.68%) |
:arrow_down: |
| ...pache/kyuubi/operation/log/LogDivertAppender.scala | 33.33% <0.00%> (-11.12%) |
:arrow_down: |
| .../kyuubi/server/mysql/constant/MySQLErrorCode.scala | 13.84% <0.00%> (-6.16%) |
:arrow_down: |
| ...pache/kyuubi/engine/YarnApplicationOperation.scala | 62.96% <0.00%> (-5.56%) |
:arrow_down: |
| ...ache/kyuubi/server/mysql/MySQLCommandHandler.scala | 75.00% <0.00%> (-4.55%) |
:arrow_down: |
| ...ver/mysql/authentication/MySQLNativePassword.scala | 73.91% <0.00%> (-4.35%) |
:arrow_down: |
| ...rg/apache/kyuubi/events/EventHandlerRegister.scala | 52.38% <0.00%> (-4.15%) |
:arrow_down: |
| ... and 55 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
cc @pan3793 , please have a check if some time available.
Thanks for adding this document, have some high-level thoughts.
We should not narrow the Kyuubi JDBC data source use cases into PySpark, it's also suitable for spark-shell, spark-sql and the normal spark jar jobs, right? So, I think maybe we can split the doc into two parts, the 1st part is "How to use Kyuubi as a Spark JDBC data source", and the 2nd is "How to use PySpark to access Kyuubi JDBC data source".
Also, to unconfusing users, we should emphasize there are TWO Spark applications in such use architects, and the most benefit is security but not performance, the arch also suitable for spark-shell
Yes. But so far this docs is purposed for python user (like AI team and etc.) accessing to Kyuubi and it is put under /client/python rather than discussing all avaialble accessing way in Spark which could be enhanced in future docs solely for Spark under /client. And they are not confusing with two spark jobs as discussed in first paragraph it is about connect to Kyuubi but not the any realted to Spark engine.
Thanks, merging to master/1.6