spark-deep-learning icon indicating copy to clipboard operation
spark-deep-learning copied to clipboard

How to use "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh "

Open RayTsui opened this issue 8 years ago • 9 comments

I follow the instructions: download the project and use build/sbt assembly and then I execute the python/run-tests.sh, but it gives me the following info:

List of assembly jars found, the last one will be used: ls: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/../target/scala-2.12/spark-deep-learning-assembly*.jar: No such file or directory

============= Searching for tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests ============= ============= Running the tests in: /Users/lei.cui/Documents/Workspace/DeepLearninginApacheSpark/spark-deep-learning-master/python/tests/graph/test_builder.py ============= /usr/local/opt/python/bin/python2.7: No module named nose

Actually, after sbt building, it produces the scala-2.11/spark-deep-learning-assembly*.jar instead of scala-2.12/spark-deep-learning-assembly*.jar. In addition, I installed the python2 at the /usr/local/bin/python2, why it will have /usr/local/opt/python/bin/python2.7: No module named nose.

RayTsui avatar Oct 17 '17 05:10 RayTsui

Actually, I am not sure how to use the "sparkdl$ SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh ", can it be executed at the command line, but it will give "sparkdl$: command not found".

RayTsui avatar Oct 17 '17 05:10 RayTsui

sparkdl$ means your current directory is spark deep learning project . SPARK_HOME is need by pyspark , SCALA_VERSION and SPARK_VERSION are used to locate the spark-deep-learning-assembly*.jar.

./python/run-tests.sh will setup enviroment and find all py in python/tests and run them one by one.

you should run command build/sbt assembly first to make sure assembly jar is ready ,then run SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh

allwefantasy avatar Oct 18 '17 14:10 allwefantasy

@RayTsui thank you for reporting the issue! @allwefantasy thank you for helping out! In addition, we also have some scripts/sbt-plugins we use to facilitate development process, which we put in https://github.com/databricks/spark-deep-learning/pull/59. You can try running SPARK_HOME="path/to/your/spark/home/directory" ./bin/totgen.sh which will generate pyspark (.py2.spark.shell, .py3.spark.shell) and spark-shell (.spark.shell) REPLs.

phi-dbq avatar Oct 18 '17 16:10 phi-dbq

@allwefantasy Thanks a lot for your answer, actually, as for the command "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh", I have few doubts,

  1. the value for each config is fixed and common to all envs, or I need to set the value based on my current env, because I install spark via "brew install apache-spark" instead of downloading the spark with its dependency hadoop(e.g., spark-2.1.1-bin/hadoop). In addition, version number for scala and spark is also based on my env?

  2. do I need to set env variable "SPARK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 " in ~/.bash_profile or I directly run the command "RK_HOME=/usr/local/lib/spark-2.1.1-bin-hadoop2.7 PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh" at the prompt.

  3. after tentative attempt, I still came cross the errors above.

if you have some suggestions, It will help me a lot.

RayTsui avatar Oct 18 '17 22:10 RayTsui

@phi-dbq Thanks a lot for your response, I will try to what you refer and give necessary feedback.

RayTsui avatar Oct 18 '17 22:10 RayTsui

  1. To make sure you have the dependencies in the following list are installed :
# This file should list any python package dependencies.
coverage>=4.4.1
h5py>=2.7.0
keras==2.0.4 # NOTE: this package has only been tested with keras 2.0.4 and may not work with other releases
nose>=1.3.7  # for testing
numpy>=1.11.2
pillow>=4.1.1,<4.2
pygments>=2.2.0
tensorflow==1.3.0
pandas>=0.19.1
six>=1.10.0
kafka-python>=1.3.5
tensorflowonspark>=1.0.5
tensorflow-tensorboard>=0.1.6

Or you can just run command to finish this:

 pip2 install -r python/requirements.txt

2.Just keep PYSPARK_PYTHON=python2 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 no change. As I have mentioned, these envs are just for locating the assembly jar. The only env you should set is SPARK_HOME. I suggest that you should not configure them in .bashrc which may have side effect in your other program.

  1. Run command as the following steps:

step 1:

      build/sbt assembly

then you should find the spark-deep-learning-assembly-0.1.0-spark2.1.jar in your-project/target/scala-2.11.

step 2:

 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh 

Also,you can specify the target file to run instead of the all the files which almost take 30m. Like this:

 SCALA_VERSION=2.11.8 SPARK_VERSION=2.1.1 ./python/run-tests.sh  /Users/allwefantasy/CSDNWorkSpace/spark-deep-learning/python/tests/transformers/tf_image_test.py

allwefantasy avatar Oct 19 '17 02:10 allwefantasy

@allwefantasy
Hi, I am really appreciated for your explanation, I understood and repeated again, it moves a lot progress, at least the unit test can cover Name Stmts Miss Cover

sparkdl/graph/init.py 0 0 100% sparkdl/graph/utils.py 81 64 21% sparkdl/image/init.py 0 0 100% sparkdl/image/imageIO.py 94 66 30% sparkdl/transformers/init.py 0 0 100% sparkdl/transformers/keras_utils.py 13 7 46% sparkdl/transformers/param.py 46 26 43%

TOTAL 234 163 30%

But there still exists some error as follows:

ModuleNotFoundError: No module named 'tensorframes'

I guess that the tensorframes can officially support linux 64, but right now I use the mac OS, is that the issue?

RayTsui avatar Oct 20 '17 01:10 RayTsui

Hello @RayTsui , I have no problem using OSX for development purposes. Can you run first:

build/sbt clean

followed by:

build/sbt assembly

You should see a line that writes: [info] Including: tensorframes-0.2.9-s_2.11.jar this indicates that tensorframes is properly included in the assembly jar, and that your problem is rather that the proper assembly cannot be found.

thunterdb avatar Oct 20 '17 04:10 thunterdb

@thunterdb Thanks a lot for your suggestions. I ran the commands, yes I can see the [info] Including: tensorframes-0.2.8-s_2.11.jar. And as you said, my issue is about "List of assembly jars found, the last one will be used: ls: $DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar: No such file or directory"

I suppose that all related jars are packaged in spark-deep-learning-assembly*.jar, but my spark-deep-learning-master-assembly-0.1.0-spark2.1.jar is generated at the path "$DIR/spark-deep-learning-master/target/scala-2.11/spark-deep-learning-master-assembly-0.1.0-spark2.1.jar" instead of "$DIR/spark-deep-learning-master/python/../target/scala-2.11/spark-deep-learning-assembly*.jar". And I tried to modified the segment of the run-tests.sh file, but it does not work.

Do you know how to locate the spark-deep-learning-master-assembly-0.1.0-spark2.1.jar?

RayTsui avatar Oct 23 '17 22:10 RayTsui