initialization-actions icon indicating copy to clipboard operation
initialization-actions copied to clipboard

Dataproc Initialization Script to Change the Yarn log directories to the local SSD mounts

Open datasherlock opened this issue 3 years ago • 0 comments

I would like to contribute a new initialization script that will change the yarn.nodemanager.log-dirs property in the yarn-site.xml to the local SSD mounts.

Here are some features -

  1. Update the value of MAX_MNT_DISK_FOR_LOGS to specify the total number of local SSDs we want to use for logging. I've currently set the default to 3 but you can change according to the workload
  2. Update LOGPATH to the required path under /mnt// to specify the logging directory. I've currently set it to hadoop/yarn/userlogs. So logs will go to /mnt//hadoop/yarn/userlogs
  3. If there are no local SSDs, the script will not add any property to yarn-site and the default boot disk path will be used
  4. If there are multiple mounts, it will create a comma separated list of paths for the property based on the MAX_MNT_DISK_FOR_LOGS setting.
  5. If MAX_MNT_DISK_FOR_LOGS is greater than the actual disks, then the actual disk count will be used. If not, the configuration will be honoured.

datasherlock avatar Jul 25 '22 03:07 datasherlock