initialization-actions [ERROR org.apache.spark.scheduler.AsyncEventQueue] On starting spark-shell in a cluster with custom 2.0-debian10 image

Starting a spark-shell results in ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception java.lang.NumberFormatException: For input string: "null"

Summary of the steps (Check next section to find exact steps)

Built a custom image with DATAPROC_VERSION=2.0-debian10 equipped with GPU
Used this image to start a new cluster with GPU
SSH to the cluster and started spark-shell
Spark shell throws the ERROR org.apache.spark.scheduler.AsyncEventQueue. (Same error is also found on creating a spark session in a pyspark notebook)

Steps to reproduce the problem

Create a customization-script for the image Eg: gpu_dataproc_debian_empty.sh. For simplicity, only add the following to this file

#!/bin/bash
echo hello

Create an image with DATAPROC_VERSION=2.0-debian10 and the above customization-script.

export CUSTOMIZATION_SCRIPT=/gpu_dataproc_debian_empty.sh
export ZONE=us-central1-a
export GCS_BUCKET=mybucket
export IMAGE_NAME=basic-debian-image
export DATAPROC_VERSION=2.0-debian10
export GPU_NAME=nvidia-tesla-t4
export GPU_COUNT=1


python3 generate_custom_image.py \
    --image-name $IMAGE_NAME \
    --dataproc-version $DATAPROC_VERSION \
    --customization-script $CUSTOMIZATION_SCRIPT \
    --no-smoke-test \
    --zone $ZONE \
    --gcs-bucket $GCS_BUCKET \
    --machine-type n1-standard-4 \
    --accelerator type=$GPU_NAME,count=$GPU_COUNT \
    --disk-size 100 \
    --subnetwork default

Start a Dataproc cluster with this image

export REGION=us-central1 
export GCS_BUCKET=mybucket
export CLUSTER_NAME=basic-debian-cluster
export NUM_GPUS=1
export NUM_WORKERS=2

gcloud dataproc clusters create $CLUSTER_NAME  \
    --region=$REGION \
    --subnet=default \
    --image=basic-debian-image \
    --master-machine-type=n1-standard-16 \
    --num-workers=$NUM_WORKERS \
    --worker-accelerator=type=nvidia-tesla-t4,count=$NUM_GPUS \
    --worker-machine-type=n1-highmem-32\
    --num-worker-local-ssds=1 \
    --optional-components=JUPYTER,ZEPPELIN \
    --metadata=rapids-runtime=SPARK \
    --bucket=$GCS_BUCKET \
    --enable-component-gateway

SSH to the master and start the spark-shell

$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/07/13 23:23:17 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/07/13 23:23:26 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
	at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
	at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
	at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1126)
	at org.apache.spark.status.ProcessSummaryWrapper.<init>(storeTypes.scala:527)
	at org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:924)
	at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
	at org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1215)
	at org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1429)
	at org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

EDIT1: The same behaviour/error is thrown when CPU cluster is used.

Jul 14 '22 00:07 SurajAralihalli

@medb @mengdong @sameerz FYI

Jul 14 '22 00:07 viadea

Note:

Using init script with Debian10: works fine.
Using custom image with Ubuntu: works fine.

So this issue is specific to : Debian10 + custom image.

Jul 14 '22 00:07 viadea

Steps to reproduce the problem with a CPU cluster

Create a customization-script for the image Eg: cpu_dataproc_debian_empty.sh. For simplicity, only add the following to this file

#!/bin/bash
echo hello

Create an image with DATAPROC_VERSION=2.0-debian10 and the above customization-script.

export CUSTOMIZATION_SCRIPT=cpu_dataproc_debian_empty.sh
export ZONE=us-central1-a
export GCS_BUCKET=mybucket
export IMAGE_NAME=basic-debian-cpu-image
export DATAPROC_VERSION=2.0-debian10


python3 generate_custom_image.py \
    --image-name $IMAGE_NAME \
    --dataproc-version $DATAPROC_VERSION \
    --customization-script $CUSTOMIZATION_SCRIPT \
    --no-smoke-test \
    --zone $ZONE \
    --gcs-bucket $GCS_BUCKET \
    --machine-type n1-standard-4 \
    --disk-size 100 \
    --subnetwork default

Start a Dataproc cluster with this image

export REGION=us-central1 
export GCS_BUCKET=mybucket
export CLUSTER_NAME=basic-debian-cpu-cluster
export NUM_WORKERS=2

gcloud dataproc clusters create $CLUSTER_NAME  \
    --region=$REGION \
    --subnet=default \
    --image=basic-debian-cpu-image\
    --master-machine-type=n1-standard-16 \
    --num-workers=$NUM_WORKERS \
    --worker-machine-type=n1-highmem-32\
    --num-worker-local-ssds=1 \
    --optional-components=JUPYTER,ZEPPELIN \
    --bucket=$GCS_BUCKET \
    --enable-component-gateway

SSH to the master and start the spark-shell

$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/07/13 23:23:16 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/07/13 23:23:17 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/07/13 23:23:26 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NumberFormatException: For input string: "null"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:580)
	at java.lang.Integer.parseInt(Integer.java:615)
	at scala.collection.immutable.StringLike.toInt(StringLike.scala:304)
	at scala.collection.immutable.StringLike.toInt$(StringLike.scala:304)
	at scala.collection.immutable.StringOps.toInt(StringOps.scala:33)
	at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1126)
	at org.apache.spark.status.ProcessSummaryWrapper.<init>(storeTypes.scala:527)
	at org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:924)
	at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
	at org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1215)
	at org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1429)
	at org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1404)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

Jul 14 '22 19:07 SurajAralihalli