CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Modify mesh mask with new mesh_mask_modifier tool

Open slevis-lmwg opened this issue 3 years ago • 37 comments

Description of changes

mesh_mask_modifier is a tool that modifies mesh_mask files. It reads the mesh_mask file and outputs a modified copy of the file.

mesh_mask_modifier will complement the fsurdat_modifier tool, which modifies fsurdat files. When a user introduces land in place of ocean grid cells in their fsurdat file, they need to modify the mesh_mask file for their fsurdat modifications to take effect in a simulation.

Specific notes

Contributors other than yourself, if any: @ekluzek

CTSM Issues Fixed (include github issue #): Fixes #1644

Are answers expected to change (and if so in what way)? N/A ...new tool

Testing performed, if any: Test run in progress. Typed: ./mesh_mask_modifier modify_slevis.cfg in /glade/work/slevis/git/mksurfdata_toolchain/tools/modify_mesh_mask

slevis-lmwg avatar Mar 05 '22 02:03 slevis-lmwg

@billsacks @ekluzek @adrifoster and I talked at stand-up about the slowness of my code. A few ideas came up:

  • fortran can run nested loops like these much faster than python
  • xarray/pandas may have non-loop solutions (xarray is built on top of pandas); pick Anderson Banihirwe's brain!
  • does it make more sense to use the existing tool that generates mesh files rather than developing a new tool for modifying mesh files...

slevis-lmwg avatar Mar 08 '22 17:03 slevis-lmwg

Thanks for the summary, Sam. To clarify on this:

  • xarray/pandas may have non-loop solutions (xarray is built on top of pandas); pick Anderson Banihirwe's brain!

I think both are built on top of numpy, so I'd think that general whole-array operation techniques that work in numpy would work here as well. If there were some function (in xarray or elsewhere) that was geared specifically for this kind of use (finding the overlapping points between two spatial domains) then all the better. But if it's hard to find a way to cast this in terms of existing functions or whole-array operations, then doing it in Fortran might be the best bet.

billsacks avatar Mar 08 '22 17:03 billsacks

Hey all,

I was thinking about this more and it might be easier to think about this outside of xarray.

I would think (?) there should be a way to do something faster if you just pull out the latitudes and longitudes into np arrays. Or perhaps, as @billsacks suggested as I was chatting with him, 1 big np array where the rows (for example) are the lats/lons from 1 file and the columns are lats/lons from the other file.

Then you may be able to do some whole-array operations to get a matrix of T/F.

I was messing around with an example of this but am having to get back to my regularly scheduled manuscript writing for the week...

I will think on this more if I have time and report back with any other ideas! But since @slevisconsulting you know much more about the requirements for this function, perhaps these ideas aren't tractable.

adrifoster avatar Mar 08 '22 19:03 adrifoster

Thank you @adrifoster @billsacks for your comments. I expect to return to this around the middle of next week. (Still distracted with the move.)

slevis-lmwg avatar Mar 12 '22 17:03 slevis-lmwg

@ekluzek I'm initially asking for your review and would also be happy if you have additional reviewers in mind. This is a relatively simple tool and development has stopped at this point. It's probably close to ready.

slevis-lmwg avatar Apr 12 '22 17:04 slevis-lmwg

The tool already has user instructions, but I just now decided to also add detailed instructions for a specific use-case, using text that I previously included in Isla Simpson's corresponding google doc.

slevis-lmwg avatar Apr 12 '22 18:04 slevis-lmwg

@slevisconsulting I have two thoughts here after starting to use it.

The first is:

  • [x] Move to a shared directory (modify_inputdatasets)? Currently this tool is in. it's own directory (as is modify_fsurdat). Personally it seems like normally directories should have more than one thing in them. So a shared directory makes more sense to me.
  • [x] Add capability to handle other variable names besides the standard names for surface datasets? I'm using it to get a mask from a domain file, so I'd like to be able to customize the variable names, which should be straightforward to do.

ekluzek avatar Apr 21 '22 15:04 ekluzek

  • Add capability to handle other variable names besides the standard names for surface datasets? I'm using it to get a mask from a domain file, so I'd like to be able to customize the variable names, which should be straightforward to do.

Follow-up from mtg: @ekluzek suggests allowing the user to specify alternate variable names in the .cfg file instead of hardwiring default ones in the code (currently landfrac, lat, lon). To clarify, these are the variables that get read from the landmask_file. Also obtain the 1d lats/lons from the 2d rather than reading from the file.

This suggestion is motivated by a use-case that appears in #1701

slevis-lmwg avatar Apr 21 '22 18:04 slevis-lmwg

@slevisconsulting I'm having trouble getting a sample landmask file in the right format. I thought it needed to be a history file, but it looks like the format is slightly different. Do you have a sample file in the correct format?

Also this makes me realize the importance of being able to set the file variable and dimension names in the config file.

ekluzek avatar Apr 28 '22 06:04 ekluzek

@slevisconsulting I'm having trouble getting a sample landmask file in the right format. I thought it needed to be a history file, but it looks like the format is slightly different. Do you have a sample file in the correct format?

Also this makes me realize the importance of being able to set the file variable and dimension names in the config file.

@ekluzek this is a sample file that works: /glade/work/slevis/git/mksurfdata_toolchain/tools/modify_fsurdat/fill_indian_ocean/fill_indianocean_slevis.nc

I agree with your suggestion of increasing the flexibility of this tool, by not hardwiring the variable and dimension names.

CF convention has dimension and coordinate variable names the same, so really just need to allow alternate variable names and the dimension names will follow.

slevis-lmwg avatar Apr 28 '22 14:04 slevis-lmwg

@slevisconsulting thanks for the sample file, that really helps. I'm still having trouble with my case, so I'd like to meet with you sometime next week to go over this.

ekluzek avatar Apr 28 '22 19:04 ekluzek

Meeting with @ekluzek

Some of these notes apply to fsurdat_modifier.

Suggestion to modify the landmask file to include 3 variables:

  • landfrac as is in my current file (not required anymore so back to 2 variables in the file)
  • landmask same as landfrac but zeros and ones (no fractions) should be read by mesh_modifier; allow as int/float/double; values other than 0 and 1 should trigger an error
  • landmask_diff which is the user's area of change (my current file calls this landmask) read by fsurdat_modifier; allow as int/float/double; values other than 0 and 1 should trigger an error

Shared optional error check by both tools that compares landmask_diff to quantity calculated on the fly by the tool:

  • for error check both tools need to read both variables
  • tell it variable and dimension names that you assume are same as in the file you're comparing against
  • discourage user from using fsurdat (var PFTDATA_MASK not same as landmask) as the starting point
  • domain file also becoming obsolete...
  • so clm history from vanilla case is the way to go

When landmask file contains lsmlon, lsmlat as coordinate variables type double/float it fails because it expects integer indices. So change code to accept lsmlon, lsmlat as coordinate variables type double/float. It's a CF convention to require lsmlon, lsmlat as coordinate variables type double/float for regular grid files (can't require them for non-reg grid files). Don't allow type int.

In my current landmask files, I can ncrename the variables lat to lsmlat and lon to lsmlon.

slevis-lmwg avatar May 02 '22 16:05 slevis-lmwg

@ekluzek as we discussed, I have placed this PR on hold (SEE UPDATE AT BOTTOM OF THIS POST): Alper's jupyter-based bathymetry tool allows interactive calculation/assignment of bathymetry. In the CSSI group we agreed that Alper would add the option of reading Isla’s modified landmask file to generate a mesh file. His tool will also perform any regridding between the landmask file’s grid and the ocean grid. This will likely cover the needs of the CSSI project.

@ekluzek if we decide that this PR is still relevant for use-cases not involving an ocean mesh and/or Alper's solution doesn't work out for whatever reason, then we can revisit this PR.

UPDATED JUNE 1, 2022: Isla Simpson would like me to complete this PR for use-cases not involving an ocean mesh.

slevis-lmwg avatar May 23 '22 22:05 slevis-lmwg

Made these updates to the landmask input file:

  • landmask (was landfrac before) but zeros and ones (no fractions) should be read by mesh_modifier
  • landmask_diff which is the user's area of change (was landmask before) read by fsurdat_modifier
  • ncrenamed the variables lat to lsmlat and lon to lsmlon

Next I'm proceeding with other suggestions listed in this earlier post.

slevis-lmwg avatar Jun 20 '22 21:06 slevis-lmwg

Shared optional error check by both tools that compares landmask_diff to quantity calculated on the fly by the tool:

  • for error check both tools need to read both variables
  • tell it variable and dimension names that you assume are same as in the file you're comparing against
  • discourage user from using fsurdat (var PFTDATA_MASK not same as landmask) as the starting point
  • domain file also becoming obsolete...
  • so clm history from vanilla case is the way to go

We agreed that the above error-check was optional, and I am leaning strongly against implementing it because it requires a clm history file that may not exist at the time the user wishes to use these tools. I realize that a user who's modifying fsurdat and/or meshes should also feel comfortable generating a vanilla-case history file. However, such a requirement changes the simplicity of this tool from easy-to-use to a-bit-complicated, I think. Regardless, I am open to hearing other opinions.

slevis-lmwg avatar Jun 28 '22 23:06 slevis-lmwg

I'm considering a different consistency check that doesn't require a history file: In mesh_mask_modifier, I can check that landmask = 1 for all landmask_diff = 1. This will ensure that the variable used in mesh_mask_modifier (landmask) is consistent with the variable used in fsurdat_modifier (landmask_diff).

slevis-lmwg avatar Jun 29 '22 04:06 slevis-lmwg

I'm still having trouble with my case, so I'd like to meet with you sometime next week to go over this.

@ekluzek will confirm that this use-case now works.

slevis-lmwg avatar Jun 30 '22 17:06 slevis-lmwg

I'm still having trouble with my case, so I'd like to meet with you sometime next week to go over this.

@ekluzek will confirm that this use-case now works.

Ok, I think this is now ready other than the test that @ekluzek may perform.

slevis-lmwg avatar Jun 30 '22 23:06 slevis-lmwg

TODO Add comment in README or other documentation that conda activate ctsm is required for fsurdat_modifier to generate a correctly formatted netcdf file on izumi.

slevis-lmwg avatar Jul 03 '22 00:07 slevis-lmwg

@slevisconsulting something new we are doing is to add commits that are just black reformatting to the top level file:

.git-blame-ignore-revs

Just add your hash for your black reformat commit to that file. That way you can do this in a clone and those commits will be ignored in git blame

git config blame.ignoreRevsFile .git-blame-ignore-revs

You commit e44dc46e44dc46 applies for this

ekluzek avatar Jul 15 '22 19:07 ekluzek

@slevisconsulting it looks like there isn't any testing for this new script. We should add some level of testing for it. You could add some unit testing, a test suite test, and/or a test/tools test for it. It also looks like the default config file won't actually work as it is, it would be good to have one that was actually functional.

  • [x] Add testing for mesh_modifier
  • [x] working config file or system test that creates one

ekluzek avatar Jul 15 '22 20:07 ekluzek

OK I got my case setup appropriately (I think) it's in

/glade/work/erik/ctsm_worktrees/modify_meshes/tools/modify_mesh_mask

Using the crujra_mesh.cfg file.

It's giving an error that the landmask and landmask_diff don't match. But, my understanding is that landmask should be the new landmask that I want and landmask_diff should be zero everywhere the mask doesn't change, and 1 where the mask does change.

@slevisconsulting could you take a look at my case and let me know what I need to do to get it to work?

ekluzek avatar Jul 15 '22 21:07 ekluzek

And I think possibly there is an error when you want to change the mask from 1 to 0 for a location, which means landmask_diff should be "1" because it's changing. It would work if you are changing the mask from 0 to 1.

@ekluzek I think I understand the confusion. I may also be confused myself, so let's try to figure this out:

  1. If I were to rename landmask_diff to landmask_new_land then it would be clearer that this variable represents land-filling.
  2. Your intention with half_landmask.nc appears to be to fill all ocean with land. Hence the 1s over ocean and the 0s over land. However, you will not accomplish that without changing the variable landmask. In fact, landmask_diff is used by this tool ONLY for consistency checking and has no effect on the mesh file.

Let me know if I have misunderstood your intention with half_landmask.nc. If not, then I will go ahead and rename landmask_diff to landmask_new_land or landmask_landfill or whatever you think is most intuitive. Then I will also update the README and .cfg instructions to be clearer.

slevis-lmwg avatar Jul 16 '22 00:07 slevis-lmwg

I'm marking edits to this post with "UPDATE"

I should clarify that to remove land, you must set landmask and landmask_diff to 0. So to summarize:

  • UPDATE: Where you're adding and/or modifying the land (eg, changing dom_pft or fmax or std_elev or soil_color etc), set landmask and landmask_diff to 1
  • Where you're removing land, set landmask and landmask_diff to 0
  • Where nothing changes, leave landmask unchanged and set landmask_diff to 0

UPDATE: Hence I now suggest that we rename landmask_diff to new_or_modified_land (or something similar).

slevis-lmwg avatar Jul 16 '22 00:07 slevis-lmwg

@slevisconsulting I looked at my case to figure out what's going on. And what's going on is the starting mesh file has mask==1 everywhere, and I want to have a land/mask with ocean, but with Antarctica as ocean as well. So it's removing land over ocean and Antarctica, and as such nothing is being filled in.

I copied landmask to landmask_diff and I got it to go further, although then it complained about a problem with latitude

    assert isclose(lat_new, lat_mesh, abs_tol=1e-5), errmsg
AssertionError: Must be equal:  lat_new = 0.0 lat_mesh = -89.75 (at ncount = 0)

ekluzek avatar Jul 19 '22 06:07 ekluzek

It would be good to discuss landmask_diff further. We should make sure it has a useful purpose. I think it would be useful to either be a mask of the area's where the mask is changing from the original grid (either being filled in or removed) or as the difference with the original mask (so it could be -1, 0, or +1). Or another approach would be for it to be the original mask (which should agree with the mask on the mesh file). You could end up putting the wrong landmask file with your mesh file, so there is utility in reproducing the original mask, and making sure they agree in the ways you expect them to.

I'm not sure I understand what landmask_diff is currently trying to function as, but there is an important utility in making sure that your landmask and mesh files agree with each other. We should perhaps discuss this?

ekluzek avatar Jul 19 '22 06:07 ekluzek

I copied landmask to landmask_diff and I got it to go further, although then it complained about a problem with latitude

    assert isclose(lat_new, lat_mesh, abs_tol=1e-5), errmsg
AssertionError: Must be equal:  lat_new = 0.0 lat_mesh = -89.75 (at ncount = 0)

This one I solved easily. Your lat_varname/lon_varname variables were indices rather than latitude/longitude values. I changed them like this in the .cfg file:

lat_varname = lat
lon_varname = lon

I also added these new lines in the .cfg file:

lat_dimname = lsmlat
lon_dimname = lsmlon

to permit different variable and dimension names (as is the case in your file).

However, now I get a similar error but for the longitude:

assert isclose(lon_new, lon_mesh, abs_tol=1e-5), errmsg
AssertionError: Must be equal:  lon_new = 180.25 lon_mesh = 0.25 (at ncount = 0)

This one I will have to pursue tomorrow.

Let's definitely discuss all this when you're available. Thanks, @ekluzek!

slevis-lmwg avatar Jul 21 '22 01:07 slevis-lmwg

[...] now I get a similar error but for the longitude:

assert isclose(lon_new, lon_mesh, abs_tol=1e-5), errmsg
AssertionError: Must be equal:  lon_new = 180.25 lon_mesh = 0.25 (at ncount = 0)

This one is more complicated. The calls on each longitude value

lon_new = lon_range_0_to_360(lon_new)
lon_mesh = lon_range_0_to_360(lon_mesh)

...don't help because the code expects the whole grid in the mesh file to line up identically to the grid in the landmask file. I will try to come up with a solution that doesn't require the user to recreate their landmask file.

slevis-lmwg avatar Jul 21 '22 19:07 slevis-lmwg

...the code expects the whole grid in the mesh file to line up identically to the grid in the landmask file. I will try to come up with a solution that doesn't require the user to recreate their landmask file.

@ekluzek with my latest updates, this tool can work with a landmask.nc file containing longitude values from -180 to 180 such as the one that you were working with. I have now generated the modified mesh file that you need, here: /glade/work/slevis/git/mksurfdata_toolchain/tools/modify_mesh_mask/crujra_case/clmforc.cruncep.V7.c2016.0.5d.ESMFmesh_c220421.nc

slevis-lmwg avatar Jul 26 '22 00:07 slevis-lmwg

I'm not sure I understand what landmask_diff is currently trying to function as, but there is an important utility in making sure that your landmask and mesh files agree with each other. We should perhaps discuss this?

I'm open to discussing this further and here's an update.

It makes sense that the name "landmask_diff" was a source of confusion, and I apologize for that. In the most recent commit I renamed it to the more descriptive "mod_lnd_props" (i.e. modify lnd properties):

  • mod_lnd_props is used by the modify_fsurdat tool.
  • mod_lnd_props = 1 where the user wishes to modify certain land properties in the fsurdat file and 0 elsewhere.
  • mod_lnd_props does not contain information about adding or removing land grid cells. That's not helpful information for the tool's purposes.
  • mod_lnd_props is not used by the modify_mesh tool except in some consistency checks.

The variable used by the modify_mesh tool is called landmask:

  • landmask = 1 where the user wants land and 0 elsewhere.
  • landmask needs to be on exactly the same grid as the input mesh file that it's modifying (the tool checks for that).
  • landmask does not need to know whether it differs from the mask in the input mesh file. That's not helpful information for the tool's purposes.
  • landmask is not used by the modify_fsurdat tool.

slevis-lmwg avatar Jul 26 '22 23:07 slevis-lmwg