Document the current version of default Spark image for SparkCluster
Document the current version of default Spark image for SparkCluster
Description:
We installed this operator through OpenShift console. Since there is no document on what the version of spark is used with the default installation, we have run into several issues where the python application's driver and cluster's executor have mismatches of Spark and python versions, similar to these issues: https://github.com/radanalyticsio/openshift-spark/issues/62 and https://github.com/radanalyticsio/openshift-spark/issues/70. We have spent time and then figured out that #186 gave us the ability to update the standalone SparkCluster to use a different version of python and spark image. But this could have been avoided would the default configuration be spelled out clearly in the operatorhub.io as well as this repo.
We think that it will greatly help users by adding the current version of the image (and the link to openshift-spark repo) used by the default installation of this spark-operator in the DEFAULT_SPARK_CLUSTER_IMAGE part of the document.
I can raise a PR if this is acceptable.
this sounds like a good idea to me, i'm not sure if there is a way to check the default image currently. maybe @jkremser might know.
thanks @adrian555 and @elmiko Totally makes sense to specify the defaults correctly in docs, and also point to official images for other versions of python and spark. cc @pdmack
@jkremser @elmiko the default image currently maps here to Spark 2.4 https://github.com/radanalyticsio/spark-operator/blob/da46a8161f00d821997c3e178e0606afdb1531bf/src/main/java/io/radanalytics/operator/Constants.java#L5
Also between the constants and environment variable , which one takes the precedence?
Additionally other versions tested? Do we maintain a combination of tested python app's driver and executor images in quay?
Also is there any documented way to create our own version of driver and executor images where we can install our own custom libraries like like confluent_kafka, numpy, pandas, etc.
Also between the constants and environment variable , which one takes the precedence?
A > B .. (A takes precedence)
field called customImage in the custom resource > env variable > constant
Additionally other versions tested? Do we maintain a combination of tested python app's driver and executor images in quay?
iirc, all the versions of driver and executors have to match... if your question was about what version of Spark works with the operator, it started with version 2.2 (if I am not mistaken) and since then it works. So you should be able to use also older version of the images from radanalyticsio org. Not all of them are in quay.io.. older versions should be available in docker.io
Also is there any documented way to create our own version of driver and executor images where we can install our own custom libraries like like confluent_kafka, numpy, pandas, etc.
hmm, probably no, but you can verify the image by https://pypi.org/project/soit/
Thanks @kresmer - We have things working now, but I would go and list some details of the workarounds we needed to do soon here
@adrian555 do you want to do a PR to the docs to ensure the clarity?