Psiflow occupies the /tmp folder and does not properly clean it
I got the attached error in my psflow_task_logs folder and could trace it back to a lo of mytmpdir.XXXXXXXXXX being created and not deleted. In this line, it is hardcoded, so that it cant be overwritten by setting TMPDIR.
The file, where it is set is here psiflow/utils/init.py - Is it possible to make it read custom TMPDIR?
--> executable follows <-- tmpdir=$(mktemp -d -p /tmp "mytmpdir.XXXXXXXXXX" || mktemp -d -t "mytmpdir.XXXXXXXXXX") && cd $tmpdir; echo "tmpdir: $PWD" && cp /vast.mnt/project/phys-20225982/PbI2_layer_DMF_psiflow/psiflow_internal/context_dir/cp2k_000000.inp cp2k.inp && { timeout -s 9 16200.0s apptainer exec -e --no-init oras://ghcr.io/molmod/cp2k:2024.1 /opt/entry.sh mpirun -np 64 cp2k.psmp -i cp2k.inp; exit 0; } --> end executable <-- cp: error writing 'cp2k.inp': No space left on device --> executable follows <-- tmpdir=$(mktemp -d -p /tmp "mytmpdir.XXXXXXXXXX" || mktemp -d -t "mytmpdir.XXXXXXXXXX") && cd $tmpdir; echo "tmpdir: $PWD" && cp /vast.mnt/project/phys-20225982/PbI2_layer_DMF_psiflow/psiflow_internal/context_dir/cp2k_000000.inp cp2k.inp && { timeout -s 9 16200.0s apptainer exec -e --no-init oras://ghcr.io/molmod/cp2k:2024.1 /opt/entry.sh mpirun -np 64 cp2k.psmp -i cp2k.inp; exit 0; } --> end executable <-- cp: error writing 'cp2k.inp': No space left on device --> executable follows <-- tmpdir=$(mktemp -d -p /tmp "mytmpdir.XXXXXXXXXX" || mktemp -d -t "mytmpdir.XXXXXXXXXX") && cd $tmpdir; echo "tmpdir: $PWD" && cp /vast.mnt/project/phys-20225982/PbI2_layer_DMF_psiflow/psiflow_internal/context_dir/cp2k_000000.inp cp2k.inp && { timeout -s 9 16200.0s apptainer exec -e --no-init oras://ghcr.io/molmod/cp2k:2024.1 /opt/entry.sh mpirun -np 64 cp2k.psmp -i cp2k.inp; exit 0; } --> end executable <-- cp: error writing 'cp2k.inp': No space left on device
Psiflow currently launches every task in a fresh tmpdir (mktemp -d -p /tmp "mytmpdir.XXXXXXXXXX" || mktemp -d -t "mytmpdir.XXXXXXXXXX") and does not clean up after itself. Usually, this is not a problem as the directories are created on HPC compute node storage and wiped after the job allocation ends. In what configuration are you running Psiflow workflows (local/hpc/..)?
We have plans to make this more flexible in the yaml config:
- specify where tmpdirs are created
- specify whether they should be kept or removed after task completion (for debugging purposes)
Unfortunately, there is quite a backlog of things that we would like to improve in the current Psiflow version first.
For the moment, the easiest fix is probably to manually adapt those lines in psiflow/utils/init.py.
Feel free to PR a fix that solves your problem and keeps the default behaviour unchanged.