spark-deep-learning icon indicating copy to clipboard operation
spark-deep-learning copied to clipboard

Use sparkdl on jupyter notebook, without web connection

Open christophelebrun opened this issue 6 years ago • 1 comments

Hello,

I am running a jupyter notebook on a EMR instance, without access to the web. I have downloaded the .jar file of sparkdl to an s3 bucket.

I tried :

# Creating SparkSession
spark = (SparkSession
            .builder
            .config('spark.jars', "s3://my_bucket/libs/spark-deep-learning-1.5.0-spark2.4-s_2.11.jar")
            .getOrCreate()
)

This cell run without error.

But I got an error with from sparkdl import DeepImageFeaturizer ModuleNotFoundError: No module named 'sparkdl'

Any idea of how to fix that ?

christophelebrun avatar Dec 18 '19 10:12 christophelebrun

use

spark.jars.packages, instead of spark.jars. Also, I had no success using a local package (in your case, you compiled one and put in S3 bucket) due to lack of parent dependency. You should pull from databricks spark package site. I know, this would have limitations but so far I've not able to find a solution.

spark-water avatar Feb 10 '20 18:02 spark-water