initialization-actions icon indicating copy to clipboard operation
initialization-actions copied to clipboard

[ERROR org.apache.spark.scheduler.AsyncEventQueue] On starting spark-shell in a cluster with custom 2.0-debian10 image

Open SurajAralihalli opened this issue 3 years ago • 5 comments

Starting a spark-shell results in ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception java.lang.NumberFormatException: For input string: "null"

Summary of the steps (Check next section to find exact steps)

  1. Built a custom image with DATAPROC_VERSION=2.0-debian10 equipped with GPU
  2. Used this image to start a new cluster with GPU
  3. SSH to the cluster and started spark-shell
  4. Spark shell throws the ERROR org.apache.spark.scheduler.AsyncEventQueue. (Same error is also found on creating a spark session in a pyspark notebook)

Steps to reproduce the problem

  1. Create a customization-script for the image Eg: gpu_dataproc_debian_empty.sh. For simplicity, only add the following to this file
#!/bin/bash
echo hello
  1. Create an image with DATAPROC_VERSION=2.0-debian10 and the above customization-script.
export CUSTOMIZATION_SCRIPT=/gpu_dataproc_debian_empty.sh
export ZONE=us-central1-a
export GCS_BUCKET=mybucket
export IMAGE_NAME=basic-debian-image
export DATAPROC_VERSION=2.0-debian10
export GPU_NAME=nvidia-tesla-t4
export GPU_COUNT=1


python3 generate_custom_image.py \
    --image-name $IMAGE_NAME \
    --dataproc-version $DATAPROC_VERSION \
    --customization-script $CUSTOMIZATION_SCRIPT \
    --no-smoke-test \
    --zone $ZONE \
    --gcs-bucket $GCS_BUCKET \
    --machine-type n1-standard-4 \
    --accelerator type=$GPU_NAME,count=$GPU_COUNT \
    --disk-size 100 \
    --subnetwork default
  1. Start a Dataproc cluster with this image
export REGION=us-central1 
export GCS_BUCKET=mybucket
export CLUSTER_NAME=basic-debian-cluster
export NUM_GPUS=1
export NUM_WORKERS=2

gcloud dataproc clusters create $CLUSTER_NAME  \
    --region=$REGION \
    --subnet=default \
    --image=basic-debian-image \
    --master-machine-type=n1-standard-16 \
    --num-workers=$NUM_WORKERS \
    --worker-accelerator=type=nvidia-tesla-t4,count=$NUM_GPUS \
    --worker-machine-type=n1-highmem-32\
    --num-worker-local-ssds=1 \
    --optional-components=JUPYTER,ZEPPELIN \
    --metadata=rapids-runtime=SPARK \
    --bucket=$GCS_BUCKET \
    --enable-component-gateway
  1. SSH to the master and start the spark-shell
$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/07/13 23:23:17 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/07/13 23:23:26 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
	at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
	at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
	at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1126)
	at org.apache.spark.status.ProcessSummaryWrapper.<init>(storeTypes.scala:527)
	at org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:924)
	at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
	at org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1215)
	at org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1429)
	at org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

EDIT1: The same behaviour/error is thrown when CPU cluster is used.

SurajAralihalli avatar Jul 14 '22 00:07 SurajAralihalli

@medb @mengdong @sameerz FYI

viadea avatar Jul 14 '22 00:07 viadea

Note:

  1. Using init script with Debian10: works fine.
  2. Using custom image with Ubuntu: works fine.

So this issue is specific to : Debian10 + custom image.

viadea avatar Jul 14 '22 00:07 viadea

Steps to reproduce the problem with a CPU cluster

  1. Create a customization-script for the image Eg: cpu_dataproc_debian_empty.sh. For simplicity, only add the following to this file
#!/bin/bash
echo hello
  1. Create an image with DATAPROC_VERSION=2.0-debian10 and the above customization-script.
export CUSTOMIZATION_SCRIPT=cpu_dataproc_debian_empty.sh
export ZONE=us-central1-a
export GCS_BUCKET=mybucket
export IMAGE_NAME=basic-debian-cpu-image
export DATAPROC_VERSION=2.0-debian10


python3 generate_custom_image.py \
    --image-name $IMAGE_NAME \
    --dataproc-version $DATAPROC_VERSION \
    --customization-script $CUSTOMIZATION_SCRIPT \
    --no-smoke-test \
    --zone $ZONE \
    --gcs-bucket $GCS_BUCKET \
    --machine-type n1-standard-4 \
    --disk-size 100 \
    --subnetwork default
  1. Start a Dataproc cluster with this image
export REGION=us-central1 
export GCS_BUCKET=mybucket
export CLUSTER_NAME=basic-debian-cpu-cluster
export NUM_WORKERS=2

gcloud dataproc clusters create $CLUSTER_NAME  \
    --region=$REGION \
    --subnet=default \
    --image=basic-debian-cpu-image\
    --master-machine-type=n1-standard-16 \
    --num-workers=$NUM_WORKERS \
    --worker-machine-type=n1-highmem-32\
    --num-worker-local-ssds=1 \
    --optional-components=JUPYTER,ZEPPELIN \
    --bucket=$GCS_BUCKET \
    --enable-component-gateway
  1. SSH to the master and start the spark-shell
$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/07/13 23:23:17 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/07/13 23:23:26 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
	at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
	at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
	at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1126)
	at org.apache.spark.status.ProcessSummaryWrapper.<init>(storeTypes.scala:527)
	at org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:924)
	at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
	at org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1215)
	at org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1429)
	at org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

SurajAralihalli avatar Jul 14 '22 19:07 SurajAralihalli