FDS running too slow on linux clusters
Describe the bug I recently installed FDS (the latest version) on my university's linux clusters. We have about 4 servers having 132 cpus each. I tried running a simulation on one server using 32 cpus (ntasks). The simulation is taking too long. In fact, it is even slower than my personal laptop. I am new to using FDS and would appreciate if someone could help. The job control script that I am using is attached.
Expected behavior Was expecting FDS to run at a much faster pace
Screenshots
Here's how the output file looks
We need to also see your input file. The output you provide shows you are only using 1 MPI process. Do you only have 1 mesh?
FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.
But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.
But fix the number to 3 and try again and let us know.
Try this: !/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds
Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.
You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.
In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.
FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.
But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.
But fix the number to 3 and try again and let us know.
I changed the number to 3 and reran the simulation. Seems like the speed hasn't improved a lot. Here's the new file. Still shows Number of MPI processes = 1
Try this: !/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds
Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.
I changed the batch file and the speed hasn't impoved by a lot. Would using 3 nodes or tasks that are multiple of 3 help. I wanted to take advantage of the large number of cpus we have in a server. Is there a way I could do that here?
Thanks
The mpiexec or mpirun you are using is from a different compiler than the binary FDS executable. That's why you see three repeats of each of the initialization info. See my previous message on compiling FDS using your cluster's build environment.
You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.
In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.
What would building the source code entail? Not very familiar with it
@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?
@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?
@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?
@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?
I did it myself but took the help of the IT people
Are you loading modules or are you running the FDS6VARS.sh script?
I just go to the terminal and use the command "bash script.sh". I don't load any modules explicitly
I have run into this issue before when I tried to use a compiled version of FDS on frontera which also utilizes srun. If you look in the submission file the user is not calling mpiexec/mpirun directly. That means srun is deciding which mpi version to call.
Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.
@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.
Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.
@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.
This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
Did you try adding a source command for FDS6VARS? Try submitting this one:
#!/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=32 source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 32 --ntasks-per-node=32 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds
If that does not work, try submitting this job:
#!/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=32 source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N -n 4 which mpiexec > ~/mpiver.txt
Then post the mpiver.txt file that is generated.
@johodges I tried running the first script but it didn't lead to any enhancement in the analysis speed.
The second one didn't run FDS, nor did it create the mpiver.txt file.
Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler. @shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.
This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
When you did this check, were you on one of the compute nodes when you ran the which command? If so, start another interactive job and then run your case manually on the compute node using the same source commands. Let us know if you still see the multiple repeats of each part of the initialization.
I am testing FDS parallel computing with Linux machine, I find the running speed is not faster than on my laptop machine, any idea?
FDS speed is function of the hardware (number and types of CPU, amount and type of memory, bus, etc.) and the configuration of the machine (how many user's, cluster or standalone, what other software is running, etc.). It is certainly not that case that every Linux machine is faster than any Windows machine.
Closing due to inactivity. @shivam11021 let us know if you are still having issues with multiple instances of FDS launching instead of a single MPI instance.