training-scripts
training-scripts copied to clipboard
Scripts to launch cluster used for Strata
The Spark Streaming Tutorial at http://ampcamp.berkeley.edu/big-data-mini-course/realtime-processing-with-spark-streaming.html gives an error message. ``` [info] Set current project to Tutorial (in build file:/root/training/streaming/scala/) [info] Updating {file:/root/training/streaming/scala/}scala... [info] Resolving org.scala-lang#scala-library;2.10 ... [warn] module not...
When doing: git clone git://github.com/amplab/training-script.git -b ampcamp4 I get the old training material, including images. spark_ec2.py seems to be hard coded against ampcamp 3 ami: # A static URL from...
In spark_ec2.py#355 one can see that no zookeeper nodes are reated and neither do docs on "Running Spark on EC2" mention options related to creating a zookeeper nodes. Is it...
the / partition is only 8G, after running the script, there is no free space. ``` [root@ip-10-143-137-174 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 7.9G...
The login banner says Spark 0.7.3, even though we're using 0.8 snapshot for AMPCamp 3.
It would be nice if the `copy-data` command had some form of progress indication. This would help to determine whether the transfer is frozen or just slow.
Cosmetic, but very convenient. Besides syntax file for Scala, we should also have default settings for using spaces vs tabs, auto/smart indentation, etc.