initialization-actions
initialization-actions copied to clipboard
Dataproc Initialization Script to Change the Yarn log directories to the local SSD mounts
I would like to contribute a new initialization script that will change the yarn.nodemanager.log-dirs property in the yarn-site.xml to the local SSD mounts.
Here are some features -
- Update the value of MAX_MNT_DISK_FOR_LOGS to specify the total number of local SSDs we want to use for logging. I've currently set the default to 3 but you can change according to the workload
- Update LOGPATH to the required path under /mnt/
/ to specify the logging directory. I've currently set it to hadoop/yarn/userlogs. So logs will go to /mnt/ /hadoop/yarn/userlogs - If there are no local SSDs, the script will not add any property to yarn-site and the default boot disk path will be used
- If there are multiple mounts, it will create a comma separated list of paths for the property based on the MAX_MNT_DISK_FOR_LOGS setting.
- If MAX_MNT_DISK_FOR_LOGS is greater than the actual disks, then the actual disk count will be used. If not, the configuration will be honoured.