zingg icon indicating copy to clipboard operation
zingg copied to clipboard

EMR process hang

Open knguyen1 opened this issue 1 year ago • 1 comments

Hi. One of my models hung for 30 mins now at these BlockManager logs. (Latest logs 30 mins ago). Is the process completing? Is there anything for my to optimize. I have four workers and 1 driver. Each at 8 cores, 32GB memory.

24/08/06 18:37:35 INFO Executor: Finished task 992.0 in stage 3043.0 (TID 34882). 4504 bytes result sent to driver
24/08/06 18:59:31 INFO BlockManager: Removing RDD 3553
24/08/06 18:59:31 INFO BlockManager: Removing RDD 2676
24/08/06 18:59:31 INFO BlockManager: Removing RDD 208
24/08/06 18:59:31 INFO BlockManager: Removing RDD 3809
24/08/06 18:59:31 INFO BlockManager: Removing RDD 2564
24/08/06 18:59:31 INFO BlockManager: Removing RDD 324
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.default.parallelism=128
spark.debug.maxToStringFields=200
spark.driver.memory=12g
spark.executor.memory=19g
spark.executor.instances=15
spark.executor.cores=5
spark-submit --master yarn \
                --name zingg-ai \
                --deploy-mode client \
                --properties-file ./zingg.conf \
                --class zingg.spark.client.SparkClient \
                /home/hadoop/zingg-0.4.0/zingg-0.4.0.jar \
                --phase label \
                --conf ./my_conf_emr.json.env \
                --license LICENSE

My numPartitions is 1000.

knguyen1 avatar Aug 06 '24 19:08 knguyen1

What are these tasks? They take a long time. And that is a large file being written to disk.

24/08/07 13:49:41 INFO TaskSetManager: Finished task 993.0 in stage 3052.0 (TID 33999) in 29 ms on ip-10-229-18-10.ec2.internal (executor 3) (999/1000)
24/08/07 14:04:33 INFO BlockManagerInfo: Added rdd_4990_323 on disk on ip-10-229-18-44.ec2.internal:33189 (size: 11.8 GiB)
24/08/07 14:12:08 INFO BlockManager: Removing RDD 334

knguyen1 avatar Aug 07 '24 14:08 knguyen1

Is this still happening @knguyen1 ? What are your data volumes and can you share the complete Zingg logs?

sonalgoyal avatar Sep 05 '24 05:09 sonalgoyal

Closing due to lack of response

sonalgoyal avatar Sep 14 '24 06:09 sonalgoyal