uabrc.github.io icon indicating copy to clipboard operation
uabrc.github.io copied to clipboard

Parallel, distributed file interaction using mpifileutils

Open mdefende opened this issue 8 months ago • 1 comments

What would you like to see added?

For interacting with a large number of files, standard tools aren't performant and can cause GPFS overload depending on how they are run. Distributing the work across multiple nodes both vastly increases performance and reduces stress on GPFS. mpifileutils is great because it can be run by the users themselves as opposed to needing admins to submit an MMLS policy to perform the action.

We should add a page going over some of the benefits of mpifileutils with examples on some of the basic use cases with scripts users can copy-paste to run themselves

mdefende avatar May 27 '25 17:05 mdefende

Example script using drm to remove a large quantity of data:

#!/bin/bash
#
#SBATCH --job-name=drm
#SBATCH --nodes=2-6
#SBATCH --ntasks=96
#SBATCH --partition=express
#SBATCH --time=02:00:00
#SBATCH --mem-per-cpu=2G
#SBATCH --output=drm.out
#SBATCH --error=drm.err
#SBATCH --spread-job

module load mpifileutils/0.12-gompi-2024a

dirs=(/path/to/dir1 /path/to/dir2 /path/to/dir3)

for d in ${dirs[@]}; do
    mpirun -np ${SLURM_NTASKS} drm -l ${d}
done

This will run 3 separate drm tasks sequentially, but the job could easily be submitted as an array job instead. Exact number of tasks and nodes can be altered based on exactly how many files need to be deleted.

Can also use --ntasks-per-node so you don't have to do math to make sure the requested number of CPUs per node can fit on the nodes you submit the job to.

mdefende avatar Jun 12 '25 19:06 mdefende