zingg
zingg copied to clipboard
EMR process hang
Hi. One of my models hung for 30 mins now at these BlockManager logs. (Latest logs 30 mins ago). Is the process completing? Is there anything for my to optimize. I have four workers and 1 driver. Each at 8 cores, 32GB memory.
24/08/06 18:37:35 INFO Executor: Finished task 992.0 in stage 3043.0 (TID 34882). 4504 bytes result sent to driver
24/08/06 18:59:31 INFO BlockManager: Removing RDD 3553
24/08/06 18:59:31 INFO BlockManager: Removing RDD 2676
24/08/06 18:59:31 INFO BlockManager: Removing RDD 208
24/08/06 18:59:31 INFO BlockManager: Removing RDD 3809
24/08/06 18:59:31 INFO BlockManager: Removing RDD 2564
24/08/06 18:59:31 INFO BlockManager: Removing RDD 324
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.default.parallelism=128
spark.debug.maxToStringFields=200
spark.driver.memory=12g
spark.executor.memory=19g
spark.executor.instances=15
spark.executor.cores=5
spark-submit --master yarn \
--name zingg-ai \
--deploy-mode client \
--properties-file ./zingg.conf \
--class zingg.spark.client.SparkClient \
/home/hadoop/zingg-0.4.0/zingg-0.4.0.jar \
--phase label \
--conf ./my_conf_emr.json.env \
--license LICENSE
My numPartitions is 1000.
What are these tasks? They take a long time. And that is a large file being written to disk.
24/08/07 13:49:41 INFO TaskSetManager: Finished task 993.0 in stage 3052.0 (TID 33999) in 29 ms on ip-10-229-18-10.ec2.internal (executor 3) (999/1000)
24/08/07 14:04:33 INFO BlockManagerInfo: Added rdd_4990_323 on disk on ip-10-229-18-44.ec2.internal:33189 (size: 11.8 GiB)
24/08/07 14:12:08 INFO BlockManager: Removing RDD 334
Is this still happening @knguyen1 ? What are your data volumes and can you share the complete Zingg logs?
Closing due to lack of response