initialization-actions
initialization-actions copied to clipboard
[ERROR org.apache.spark.scheduler.AsyncEventQueue] On starting spark-shell in a cluster with custom 2.0-debian10 image
Starting a spark-shell results in ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception java.lang.NumberFormatException: For input string: "null"
Summary of the steps (Check next section to find exact steps)
- Built a custom image with
DATAPROC_VERSION=2.0-debian10equipped with GPU - Used this image to start a new cluster with GPU
- SSH to the cluster and started
spark-shell - Spark shell throws the
ERROR org.apache.spark.scheduler.AsyncEventQueue. (Same error is also found on creating a spark session in a pyspark notebook)
Steps to reproduce the problem
- Create a customization-script for the image
Eg: gpu_dataproc_debian_empty.sh. For simplicity, only add the following to this file
#!/bin/bash
echo hello
- Create an image with
DATAPROC_VERSION=2.0-debian10and the above customization-script.
export CUSTOMIZATION_SCRIPT=/gpu_dataproc_debian_empty.sh
export ZONE=us-central1-a
export GCS_BUCKET=mybucket
export IMAGE_NAME=basic-debian-image
export DATAPROC_VERSION=2.0-debian10
export GPU_NAME=nvidia-tesla-t4
export GPU_COUNT=1
python3 generate_custom_image.py \
--image-name $IMAGE_NAME \
--dataproc-version $DATAPROC_VERSION \
--customization-script $CUSTOMIZATION_SCRIPT \
--no-smoke-test \
--zone $ZONE \
--gcs-bucket $GCS_BUCKET \
--machine-type n1-standard-4 \
--accelerator type=$GPU_NAME,count=$GPU_COUNT \
--disk-size 100 \
--subnetwork default
- Start a Dataproc cluster with this image
export REGION=us-central1
export GCS_BUCKET=mybucket
export CLUSTER_NAME=basic-debian-cluster
export NUM_GPUS=1
export NUM_WORKERS=2
gcloud dataproc clusters create $CLUSTER_NAME \
--region=$REGION \
--subnet=default \
--image=basic-debian-image \
--master-machine-type=n1-standard-16 \
--num-workers=$NUM_WORKERS \
--worker-accelerator=type=nvidia-tesla-t4,count=$NUM_GPUS \
--worker-machine-type=n1-highmem-32\
--num-worker-local-ssds=1 \
--optional-components=JUPYTER,ZEPPELIN \
--metadata=rapids-runtime=SPARK \
--bucket=$GCS_BUCKET \
--enable-component-gateway
- SSH to the master and start the spark-shell
$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/07/13 23:23:17 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/07/13 23:23:26 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1126)
at org.apache.spark.status.ProcessSummaryWrapper.<init>(storeTypes.scala:527)
at org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:924)
at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
at org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1215)
at org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1429)
at org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
EDIT1: The same behaviour/error is thrown when CPU cluster is used.
@medb @mengdong @sameerz FYI
Note:
- Using init script with Debian10: works fine.
- Using custom image with Ubuntu: works fine.
So this issue is specific to : Debian10 + custom image.
Steps to reproduce the problem with a CPU cluster
- Create a customization-script for the image Eg: cpu_dataproc_debian_empty.sh. For simplicity, only add the following to this file
#!/bin/bash
echo hello
- Create an image with
DATAPROC_VERSION=2.0-debian10and the above customization-script.
export CUSTOMIZATION_SCRIPT=cpu_dataproc_debian_empty.sh
export ZONE=us-central1-a
export GCS_BUCKET=mybucket
export IMAGE_NAME=basic-debian-cpu-image
export DATAPROC_VERSION=2.0-debian10
python3 generate_custom_image.py \
--image-name $IMAGE_NAME \
--dataproc-version $DATAPROC_VERSION \
--customization-script $CUSTOMIZATION_SCRIPT \
--no-smoke-test \
--zone $ZONE \
--gcs-bucket $GCS_BUCKET \
--machine-type n1-standard-4 \
--disk-size 100 \
--subnetwork default
- Start a Dataproc cluster with this image
export REGION=us-central1
export GCS_BUCKET=mybucket
export CLUSTER_NAME=basic-debian-cpu-cluster
export NUM_WORKERS=2
gcloud dataproc clusters create $CLUSTER_NAME \
--region=$REGION \
--subnet=default \
--image=basic-debian-cpu-image\
--master-machine-type=n1-standard-16 \
--num-workers=$NUM_WORKERS \
--worker-machine-type=n1-highmem-32\
--num-worker-local-ssds=1 \
--optional-components=JUPYTER,ZEPPELIN \
--bucket=$GCS_BUCKET \
--enable-component-gateway
- SSH to the master and start the spark-shell
$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/07/13 23:23:17 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/07/13 23:23:26 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1126)
at org.apache.spark.status.ProcessSummaryWrapper.<init>(storeTypes.scala:527)
at org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:924)
at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
at org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1215)
at org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1429)
at org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)