spark-deep-learning
spark-deep-learning copied to clipboard
Use sparkdl on jupyter notebook, without web connection
Hello,
I am running a jupyter notebook on a EMR instance, without access to the web. I have downloaded the .jar file of sparkdl to an s3 bucket.
I tried :
# Creating SparkSession
spark = (SparkSession
.builder
.config('spark.jars', "s3://my_bucket/libs/spark-deep-learning-1.5.0-spark2.4-s_2.11.jar")
.getOrCreate()
)
This cell run without error.
But I got an error with from sparkdl import DeepImageFeaturizer
ModuleNotFoundError: No module named 'sparkdl'
Any idea of how to fix that ?
use
spark.jars.packages, instead of spark.jars. Also, I had no success using a local package (in your case, you compiled one and put in S3 bucket) due to lack of parent dependency. You should pull from databricks spark package site. I know, this would have limitations but so far I've not able to find a solution.