anuga_core icon indicating copy to clipboard operation
anuga_core copied to clipboard

run the code on multi-node platform

Open Dongxueyang opened this issue 5 years ago • 7 comments

Hi @stoiver :

I want to run a large simulation. To improve the efficiency of calculation. I want to run the program on a supercomputing platform, and use multi-nodes. So if the codes support multi-node computing mode?

Dongxueyang avatar Jul 06 '20 04:07 Dongxueyang

@Dongxueyang yes anuga can run on multinode suprcomputers. Parallelisation is implemented via MPI. The python 2 version has been extensively run in parallel on the NCI (raijin). I haven't as yet tried it on gadi. It uses the pypar mpi python wrapper.

We are just moving over to using python 3. We seem to have a working version which uses mpi4py as the MPI python wrapper. It would be great if you could test out the python 3 version. I will push it over to the GA git repository (branch anuga_py3).

stoiver avatar Jul 06 '20 06:07 stoiver

@stoiver That is great. Thank you so much. I can try to download the version of anuga_py3 and try to use on a multinode suprcomputers(with mpi4py). So can I get the branch anuga_py3 now? Where can I download and test?

Dongxueyang avatar Jul 06 '20 06:07 Dongxueyang

@Dongxueyang You can use the anuga_py3 branch of the anuga_core repository. Might be best to clone a new copy of anuga_core and add the branch. Ie

git clone -b anuga_py3 https://github.com/GeoscienceAustralia/anuga_core.git

You can get a hint at which python libraries to install by looking at the shell scripts in the tools directory in downloaded repository.

stoiver avatar Jul 12 '20 02:07 stoiver

Hi @stoiver. If I want to run a simulation on a multi-node platform.(use two nodes and 24 cores (12 cores/node)) I use this command: mpirun -machinefile machinefile -np 24 python test.py and the machinefile:

node1_id node2_id

Is the command right? If it is wrong. how to run the simulation on two nodes (24cores)?

Thansk a lot. Hope your reply. Dong

Dongxueyang avatar Jul 29 '20 08:07 Dongxueyang

@stoiver Did you see the question above and could you give me some advice. I try to run the example/simple_examples/channel3_parallel.py on two nodes (48cores). But I can not run the simulation.

Dong

Dongxueyang avatar Jul 31 '20 13:07 Dongxueyang

@Dongxueyang you need to setup mpi to run on your 24 cores. THis would depend on whether you are using openmpi or mpich. Do you have a system admin person for your system? You should be able to setup your mpirun command to run by default on your two nodes. I recall when working on a cluster a few years ago that you need to ensure you can automatically log into the two nodes using ssh keys. But as suggested, get help from you system admin.

stoiver avatar Jul 31 '20 13:07 stoiver

@stoiver Ok, thanks I use openmpi on the cluster. I try to ask the system admin firstly. Thanks a lot. And I want to know I must install the same openmpi on every nodes, right?

Dongxueyang avatar Jul 31 '20 13:07 Dongxueyang