kyuubi icon indicating copy to clipboard operation
kyuubi copied to clipboard

[KYUUBI #3406] [Subtask] [Doc] Add PySpark client docs

Open bowenliang123 opened this issue 3 years ago • 5 comments

Why are the changes needed?

close #3406.

Add PySpark client docs.

How was this patch tested?

  • [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • [ ] Add screenshots for manual tests if appropriate

  • [ ] Run test locally before make a pull request

bowenliang123 avatar Sep 05 '22 08:09 bowenliang123

Codecov Report

Merging #3407 (fb0cfcd) into master (b3ecaef) will increase coverage by 0.14%. The diff coverage is n/a.

:exclamation: Current head fb0cfcd differs from pull request most recent head a181a5b. Consider uploading reports for the commit a181a5b to get more accurate results

@@             Coverage Diff              @@
##             master    #3407      +/-   ##
============================================
+ Coverage     51.50%   51.65%   +0.14%     
  Complexity       13       13              
============================================
  Files           480      482       +2     
  Lines         26664    26933     +269     
  Branches       3728     3760      +32     
============================================
+ Hits          13733    13911     +178     
- Misses        11591    11664      +73     
- Partials       1340     1358      +18     
Impacted Files Coverage Δ
.../org/apache/kyuubi/ha/client/DiscoveryClient.scala 36.36% <0.00%> (-22.73%) :arrow_down:
...ache/kyuubi/engine/flink/FlinkProcessBuilder.scala 73.77% <0.00%> (-14.76%) :arrow_down:
...kyuubi/engine/spark/session/SparkSessionImpl.scala 70.58% <0.00%> (-13.42%) :arrow_down:
...e/kyuubi/engine/spark/operation/ExecuteScala.scala 73.17% <0.00%> (-11.68%) :arrow_down:
...pache/kyuubi/operation/log/LogDivertAppender.scala 33.33% <0.00%> (-11.12%) :arrow_down:
.../kyuubi/server/mysql/constant/MySQLErrorCode.scala 13.84% <0.00%> (-6.16%) :arrow_down:
...pache/kyuubi/engine/YarnApplicationOperation.scala 62.96% <0.00%> (-5.56%) :arrow_down:
...ache/kyuubi/server/mysql/MySQLCommandHandler.scala 75.00% <0.00%> (-4.55%) :arrow_down:
...ver/mysql/authentication/MySQLNativePassword.scala 73.91% <0.00%> (-4.35%) :arrow_down:
...rg/apache/kyuubi/events/EventHandlerRegister.scala 52.38% <0.00%> (-4.15%) :arrow_down:
... and 55 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Sep 05 '22 09:09 codecov-commenter

cc @pan3793 , please have a check if some time available.

bowenliang123 avatar Sep 23 '22 14:09 bowenliang123

Thanks for adding this document, have some high-level thoughts.

We should not narrow the Kyuubi JDBC data source use cases into PySpark, it's also suitable for spark-shell, spark-sql and the normal spark jar jobs, right? So, I think maybe we can split the doc into two parts, the 1st part is "How to use Kyuubi as a Spark JDBC data source", and the 2nd is "How to use PySpark to access Kyuubi JDBC data source".

pan3793 avatar Sep 23 '22 14:09 pan3793

Also, to unconfusing users, we should emphasize there are TWO Spark applications in such use architects, and the most benefit is security but not performance, the arch also suitable for spark-shell

pan3793 avatar Sep 23 '22 14:09 pan3793

Yes. But so far this docs is purposed for python user (like AI team and etc.) accessing to Kyuubi and it is put under /client/python rather than discussing all avaialble accessing way in Spark which could be enhanced in future docs solely for Spark under /client. And they are not confusing with two spark jobs as discussed in first paragraph it is about connect to Kyuubi but not the any realted to Spark engine.

bowenliang123 avatar Sep 23 '22 14:09 bowenliang123

Thanks, merging to master/1.6

pan3793 avatar Sep 24 '22 15:09 pan3793