Running application on the cluster
Am a Spark and Docker noob and this is actually a question and not an issue.
I followed your instructions and was able to setup the cluster and run the example. This is what I see as my cluster status -
vagrant@packer-virtualbox-iso:/vagrant/sparkling$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8f5d44eefa65 amplab/spark-worker:0.9.0 /root/spark_worker_f About an hour ago Up About an hour 8888/tcp prickly_lumiere
33c48ef9d17e amplab/spark-worker:0.9.0 /root/spark_worker_f About an hour ago Up About an hour 8888/tcp stoic_feynman
d91e47ed0b90 amplab/spark-worker:0.9.0 /root/spark_worker_f About an hour ago Up About an hour 8888/tcp ecstatic_babbage
e173ecd4f4c0 amplab/spark-master:0.9.0 /root/spark_master_f About an hour ago Up About an hour 7077/tcp, 8080/tcp berserk_nobel
d67f979d70fe amplab/dnsmasq-precise:latest /root/dnsmasq_files/ About an hour ago Up About an hour
I have written a Spark program for Linear Regression which runs perfectly in the local mode. It is a very small program and on github here
Now, I want to run this program on my spark cluster. The instructions in the Spark programming guide leave me scratching my head about what to do next. Want your help to know what is the right way to run the application -
- I get the scala prompt when I do the docker attach. Should I run my application from this prompt?
- I have a Vagrant setup on which am running docker. On my vagrant ubuntu box I have the application code which I compile and assemble using sbt. Can I somehow deploy the application after assembly from the sbt to the cluster?
If this has been explained elsewhere then please point me as I could not find any example on how to run an application program on a spark cluster.
Thank you very much.
Hi @bharath12345 you're right that's not actually covered in the docs. Have you tried to scp your jar into the master container (see instructions on ssh login) and run it from there? I believe Spark should be installed inside /opt.
I'm afraid there is no way to directory deploy with sbt. However, you could use the data dir option when you start the cluster to attach a directory that you then deploy your jar to. You would still need to start it from the command line by ssh-ing into the master I guess.