[DOC] Misleading and unclear documentation for the Spark Connector in the SQL/PPL docs
What do you want to do?
- [x] Request a change to existing documentation
- [ ] Add new documentation
- [ ] Report a technical problem with the documentation
- [ ] Other
Tell us about your request. Regarding: https://opensearch.org/docs/latest/search-plugins/sql/settings/#spark-connector-settings
-
The Spark connector is, according to this comment only supporting AWS EMR Serverless Spark (which means I need to have AWS credentials). This should be made clear in the docs.
-
The docs lacks examples how to setup EMR Serverless Spark and OpenSearch and where to provide the configuration (like
spark.uri). For an user its unclear how to setup a basic working example. -
Some of the config properties lacks examples and the info which values are valid:
-
spark.uri"The identifier for your Spark data source." is misleading, lacks example and what the default is and wether its mandatory -
spark.auth.typeIts unclear which values are valid and what the default is and wether its mandatory
-
-
The spark connector docs lacks an reference to https://opensearch.org/docs/latest/dashboards/management/data-sources/ (and potentially https://opensearch.org/docs/latest/dashboards/management/accelerate-external-data/) and an explanation and examples how to add spark as a datasource
-
The docs are not coherent with https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/spark_connector.rst
-
emr.clusteris missing for example
-
-
The ppl example is unclear
POST /_plugins/_ppl
content-type: application/json
{
"query": "source = my_spark.sql('select * from alb_logs')"
}
To what is my_spark referring to?
Version: all since Spark connector is supported
What other resources are available?
- https://github.com/opensearch-project/opensearch-spark/pull/606#discussion_r1752113941
- https://github.com/opensearch-project/opensearch-spark/issues/4#issuecomment-1631451276
- https://github.com/opensearch-project/sql/issues/948#issue-1418627454
- https://github.com/opensearch-project/opensearch-spark/discussions/317
- https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst
@salyh: Thanks for submitting this issue! I'll find a dev who can help make the changes you requested.
@YANG-DB @Naarcha-AWS any update? We need to clarify this to get https://github.com/opensearch-project/opensearch-spark/pull/606 done
Closing as stale