spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Document the current version of default Spark image for SparkCluster

Open adrian555 opened this issue 6 years ago • 6 comments

Document the current version of default Spark image for SparkCluster

Description:

We installed this operator through OpenShift console. Since there is no document on what the version of spark is used with the default installation, we have run into several issues where the python application's driver and cluster's executor have mismatches of Spark and python versions, similar to these issues: https://github.com/radanalyticsio/openshift-spark/issues/62 and https://github.com/radanalyticsio/openshift-spark/issues/70. We have spent time and then figured out that #186 gave us the ability to update the standalone SparkCluster to use a different version of python and spark image. But this could have been avoided would the default configuration be spelled out clearly in the operatorhub.io as well as this repo.

We think that it will greatly help users by adding the current version of the image (and the link to openshift-spark repo) used by the default installation of this spark-operator in the DEFAULT_SPARK_CLUSTER_IMAGE part of the document.

I can raise a PR if this is acceptable.

adrian555 avatar Nov 01 '19 17:11 adrian555

this sounds like a good idea to me, i'm not sure if there is a way to check the default image currently. maybe @jkremser might know.

elmiko avatar Nov 01 '19 19:11 elmiko

thanks @adrian555 and @elmiko Totally makes sense to specify the defaults correctly in docs, and also point to official images for other versions of python and spark. cc @pdmack

animeshsingh avatar Nov 02 '19 03:11 animeshsingh

@jkremser @elmiko the default image currently maps here to Spark 2.4 https://github.com/radanalyticsio/spark-operator/blob/da46a8161f00d821997c3e178e0606afdb1531bf/src/main/java/io/radanalytics/operator/Constants.java#L5

Also between the constants and environment variable , which one takes the precedence?

Additionally other versions tested? Do we maintain a combination of tested python app's driver and executor images in quay?

Also is there any documented way to create our own version of driver and executor images where we can install our own custom libraries like like confluent_kafka, numpy, pandas, etc.

animeshsingh avatar Nov 02 '19 04:11 animeshsingh

Also between the constants and environment variable , which one takes the precedence?

A > B .. (A takes precedence)

field called customImage in the custom resource > env variable > constant

Additionally other versions tested? Do we maintain a combination of tested python app's driver and executor images in quay?

iirc, all the versions of driver and executors have to match... if your question was about what version of Spark works with the operator, it started with version 2.2 (if I am not mistaken) and since then it works. So you should be able to use also older version of the images from radanalyticsio org. Not all of them are in quay.io.. older versions should be available in docker.io

Also is there any documented way to create our own version of driver and executor images where we can install our own custom libraries like like confluent_kafka, numpy, pandas, etc.

hmm, probably no, but you can verify the image by https://pypi.org/project/soit/

jkremser avatar Nov 06 '19 23:11 jkremser

Thanks @kresmer - We have things working now, but I would go and list some details of the workarounds we needed to do soon here

animeshsingh avatar Nov 08 '19 18:11 animeshsingh

@adrian555 do you want to do a PR to the docs to ensure the clarity?

animeshsingh avatar Dec 05 '19 23:12 animeshsingh