fds icon indicating copy to clipboard operation
fds copied to clipboard

FDS running too slow on linux clusters

Open shivam11021 opened this issue 1 year ago • 21 comments

Describe the bug I recently installed FDS (the latest version) on my university's linux clusters. We have about 4 servers having 132 cpus each. I tried running a simulation on one server using 32 cpus (ntasks). The simulation is taking too long. In fact, it is even slower than my personal laptop. I am new to using FDS and would appreciate if someone could help. The job control script that I am using is attached.

Expected behavior Was expecting FDS to run at a much faster pace

Screenshots

Here's how the output file looks image

script2.txt

shivam11021 avatar Jul 17 '24 14:07 shivam11021

We need to also see your input file. The output you provide shows you are only using 1 MPI process. Do you only have 1 mesh?

rmcdermo avatar Jul 17 '24 14:07 rmcdermo

Small_Scale_FINAL (1).txt

I think there are 3 meshes

shivam11021 avatar Jul 17 '24 15:07 shivam11021

FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.

But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.

But fix the number to 3 and try again and let us know.

rmcdermo avatar Jul 17 '24 15:07 rmcdermo

Try this: !/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.

marcosvanella avatar Jul 17 '24 15:07 marcosvanella

You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.

In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.

johodges avatar Jul 17 '24 15:07 johodges

FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.

But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.

But fix the number to 3 and try again and let us know.

I changed the number to 3 and reran the simulation. Seems like the speed hasn't improved a lot. Here's the new file. Still shows Number of MPI processes = 1

f1_err.txt

shivam11021 avatar Jul 17 '24 17:07 shivam11021

Try this: !/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.

I changed the batch file and the speed hasn't impoved by a lot. Would using 3 nodes or tasks that are multiple of 3 help. I wanted to take advantage of the large number of cpus we have in a server. Is there a way I could do that here?

Thanks

shivam11021 avatar Jul 17 '24 17:07 shivam11021

The mpiexec or mpirun you are using is from a different compiler than the binary FDS executable. That's why you see three repeats of each of the initialization info. See my previous message on compiling FDS using your cluster's build environment.

johodges avatar Jul 17 '24 17:07 johodges

You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.

In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.

What would building the source code entail? Not very familiar with it

shivam11021 avatar Jul 17 '24 17:07 shivam11021

@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?

@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?

rmcdermo avatar Jul 17 '24 18:07 rmcdermo

@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?

@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?

I did it myself but took the help of the IT people

shivam11021 avatar Jul 17 '24 18:07 shivam11021

Are you loading modules or are you running the FDS6VARS.sh script?

rmcdermo avatar Jul 17 '24 18:07 rmcdermo

I just go to the terminal and use the command "bash script.sh". I don't load any modules explicitly

shivam11021 avatar Jul 17 '24 18:07 shivam11021

I have run into this issue before when I tried to use a compiled version of FDS on frontera which also utilizes srun. If you look in the submission file the user is not calling mpiexec/mpirun directly. That means srun is deciding which mpi version to call.

johodges avatar Jul 17 '24 18:07 johodges

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.

@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

johodges avatar Jul 17 '24 19:07 johodges

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.

@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)

shivam11021 avatar Jul 22 '24 13:07 shivam11021

Did you try adding a source command for FDS6VARS? Try submitting this one:

#!/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=32 source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 32 --ntasks-per-node=32 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

If that does not work, try submitting this job:

#!/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=32 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-node=32 source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N -n 4 which mpiexec > ~/mpiver.txt

Then post the mpiver.txt file that is generated.

johodges avatar Jul 29 '24 21:07 johodges

@johodges I tried running the first script but it didn't lead to any enhancement in the analysis speed.

The second one didn't run FDS, nor did it create the mpiver.txt file.

shivam11021 avatar Jul 30 '24 20:07 shivam11021

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler. @shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)

When you did this check, were you on one of the compute nodes when you ran the which command? If so, start another interactive job and then run your case manually on the compute node using the same source commands. Let us know if you still see the multiple repeats of each part of the initialization.

johodges avatar Aug 10 '24 16:08 johodges

I am testing FDS parallel computing with Linux machine, I find the running speed is not faster than on my laptop machine, any idea?

Yunlongjasonliu avatar Oct 24 '24 02:10 Yunlongjasonliu

FDS speed is function of the hardware (number and types of CPU, amount and type of memory, bus, etc.) and the configuration of the machine (how many user's, cluster or standalone, what other software is running, etc.). It is certainly not that case that every Linux machine is faster than any Windows machine.

drjfloyd avatar Oct 24 '24 10:10 drjfloyd

Closing due to inactivity. @shivam11021 let us know if you are still having issues with multiple instances of FDS launching instead of a single MPI instance.

johodges avatar Sep 06 '25 17:09 johodges