initialization-actions icon indicating copy to clipboard operation
initialization-actions copied to clipboard

[oozie] intermittent error writing to HDFS during init action

Open cjac opened this issue 2 years ago • 1 comments

Some users may experience script failure when clusters start the oozie init action script prior to HDFS being fully online.

+ hadoop fs -put -f /tmp/oozie-install-m95M/share /user/oozie/
2023-08-07 20:41:35,938 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/oozie/share/lib/pig/hadoop-yarn-client-3.3.3.jar._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 0 datanode(s) running and 0 node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2315)

cjac avatar Aug 14 '23 20:08 cjac

fixed in #1089

May need to be paired with --metadata startup-script-url="${INIT_ACTIONS_ROOT}/delay-masters-startup.sh"

#!/bin/bash

set -x

readonly ROLE="$(/usr/share/google/get_metadata_value attributes/dataproc-role)"
if [[ "${ROLE}" != 'Master' ]]; then set +x; exit 0; fi

node_number=$(echo ${HOSTNAME} | perl -ne '/-m-(\d+)/; print $1')
delay_seconds=$((${node_number} * 60))
sleep ${delay_seconds}s

NOW=$(date +"%F-%T")
echo "instance #${node_number} (${HOSTNAME}) proceeds at ${NOW}" | tee /var/log/delay-masters.log

set +x

cjac avatar Sep 19 '23 21:09 cjac

This issues appears to be resolved

cjac avatar Jul 21 '24 20:07 cjac