mizuRoute Limitation: Different number of tasks change answers

I'm not sure if this is expected or not. But, I did find that mizuRoute changes answers if a different number of tasks is used. The test I did for this is:

PEM_Ld10.nldas2_rMERIT_mnldas2.I2000Clm50SpMizGs.cheyenne_gnu.mizuroute-default

The test has no answer changes for CTSM or CPL, but mizuRoute fields change as follows...

 RMS basRunoff                        3.0671E-05            NORMALIZED  2.6883E+00
 RMS instRunoff                       1.8783E+00            NORMALIZED  3.6466E+00
 RMS dlayRunoff                       1.2893E+00            NORMALIZED  3.4618E+00
 RMS sumUpstreamRunoff                6.4705E+02            NORMALIZED  1.4161E+01
 RMS KWTroutedRunoff                  6.5100E+02            NORMALIZED  1.5828E+01
 RMS IRFroutedRunoff                  2.6097E+02            NORMALIZED  9.0326E+00
 RMS volume                           2.1904E+06            NORMALIZED  1.1718E+01

Jan 20 '22 07:01 ekluzek

Hi @ekluzek , Yes, this is expected. A different number of MPI tasks changes orders of basinID (hru dimension) and reachID (seg dimension) in history file.

If basinID and reachID in one history file are sorted to the ones in another history file with different MPI task run, both should be identical. I have a python script to sort the netcdf.

Roughly speaking, the array order associated with hru and seg are determined based on independent river basins defined based on the MPI tasks numbers

Jan 20 '22 15:01 nmizukami

Ahhh, OK. So right now we could run the python script to show that answers are identical.

In the short term I think it would be good to have that script in the repository so that it would be easy to run the PEM test and then verify that answers are identical with the script. Could you add the script somewhere in the repo?

In the medium to longer term it would be good to have a mode where the sorting was done inside the code, so that you don't need the external sorting script. In the coupler we have a trigger like this that slows things down a little bit, but ensures that answers will be identical on different number of processors. That trigger is already turned on for testing, we just need something similar for mizuRoute. A slow simple solution would be fine to start with. A future advancement could be to speed it up. Since, you wouldn't normally run in this mode it's OK for it to be slower.

How hard would it be to add such a flag that would do the sorting needed to make sure answers are identical?

Jan 20 '22 21:01 ekluzek

We talked about this and we want to look into using the python script to first just be able to do this checking. And then add some automatic tests that run the script so that we know answers don't change. If we can do that we wouldn't need to get this into the code. @nmizukami did work on this, but it was really slow.

May 18 '22 20:05 ekluzek

Hi Erik (@ekluzek)

I am wondering if we can use this to reorder history netcdfs before comparing the two.

/glade/u/home/mizukami/hydro_nm/python_general/nc_reorder.py

usage: nc_reorder.py [-h] nc1 nc2 var dim

Script to reorder netcdf all the variables with specified dimension based on desired ordered variable in the 2nd netcdf

positional arguments:
  nc1         input netcdf to be ordered
  nc2         second netcdf containing desired ordered variables
  var         name of nc1 and nc2 common variable (e.g., hruId) used for reordering
  dim         name of nc1 and nc2 dimension, along which variable is reordered

mizuRoute history file may have vars(time, seg) with reachID(seg) and may have vars(time, hru) with basinID(hru). so typically I do

nc_reorder.py in.nc ref.nc reachID seg

This create in.reorder.nc, which can be compared with ref.nc (only variables with seg dimension).

This sorting is not terribly slow.

Jan 26 '24 15:01 nmizukami

Yes this would be great. Having this as a tool that's supported would allow us to check by hand which is a good start.

We would need to add in it's usage to the test system in order for the tests to use it automatically.

One way to do that would be to add mizuRoute specific system tests for some exact restart tests that do this as an additional step after cases are run. One way to do that would be to extend the ERP and PEM test classes for mizuRoute to contain this step. So it might not be too hard to do.

Anyway bottom line is I'd love to have this script supported as part of the mizuRoute code base.

Jan 27 '24 04:01 ekluzek