Modify mesh mask with new mesh_mask_modifier tool
Description of changes
mesh_mask_modifier is a tool that modifies mesh_mask files. It reads the mesh_mask file and outputs a modified copy of the file.
mesh_mask_modifier will complement the fsurdat_modifier tool, which modifies fsurdat files. When a user introduces land in place of ocean grid cells in their fsurdat file, they need to modify the mesh_mask file for their fsurdat modifications to take effect in a simulation.
Specific notes
Contributors other than yourself, if any: @ekluzek
CTSM Issues Fixed (include github issue #): Fixes #1644
Are answers expected to change (and if so in what way)? N/A ...new tool
Testing performed, if any:
Test run in progress. Typed:
./mesh_mask_modifier modify_slevis.cfg
in /glade/work/slevis/git/mksurfdata_toolchain/tools/modify_mesh_mask
@billsacks @ekluzek @adrifoster and I talked at stand-up about the slowness of my code. A few ideas came up:
- fortran can run nested loops like these much faster than python
- xarray/pandas may have non-loop solutions (xarray is built on top of pandas); pick Anderson Banihirwe's brain!
- does it make more sense to use the existing tool that generates mesh files rather than developing a new tool for modifying mesh files...
Thanks for the summary, Sam. To clarify on this:
- xarray/pandas may have non-loop solutions (xarray is built on top of pandas); pick Anderson Banihirwe's brain!
I think both are built on top of numpy, so I'd think that general whole-array operation techniques that work in numpy would work here as well. If there were some function (in xarray or elsewhere) that was geared specifically for this kind of use (finding the overlapping points between two spatial domains) then all the better. But if it's hard to find a way to cast this in terms of existing functions or whole-array operations, then doing it in Fortran might be the best bet.
Hey all,
I was thinking about this more and it might be easier to think about this outside of xarray.
I would think (?) there should be a way to do something faster if you just pull out the latitudes and longitudes into np arrays. Or perhaps, as @billsacks suggested as I was chatting with him, 1 big np array where the rows (for example) are the lats/lons from 1 file and the columns are lats/lons from the other file.
Then you may be able to do some whole-array operations to get a matrix of T/F.
I was messing around with an example of this but am having to get back to my regularly scheduled manuscript writing for the week...
I will think on this more if I have time and report back with any other ideas! But since @slevisconsulting you know much more about the requirements for this function, perhaps these ideas aren't tractable.
Thank you @adrifoster @billsacks for your comments. I expect to return to this around the middle of next week. (Still distracted with the move.)
@ekluzek I'm initially asking for your review and would also be happy if you have additional reviewers in mind. This is a relatively simple tool and development has stopped at this point. It's probably close to ready.
The tool already has user instructions, but I just now decided to also add detailed instructions for a specific use-case, using text that I previously included in Isla Simpson's corresponding google doc.
@slevisconsulting I have two thoughts here after starting to use it.
The first is:
- [x] Move to a shared directory (modify_inputdatasets)? Currently this tool is in. it's own directory (as is modify_fsurdat). Personally it seems like normally directories should have more than one thing in them. So a shared directory makes more sense to me.
- [x] Add capability to handle other variable names besides the standard names for surface datasets? I'm using it to get a mask from a domain file, so I'd like to be able to customize the variable names, which should be straightforward to do.
- Add capability to handle other variable names besides the standard names for surface datasets? I'm using it to get a mask from a domain file, so I'd like to be able to customize the variable names, which should be straightforward to do.
Follow-up from mtg: @ekluzek suggests allowing the user to specify alternate variable names in the .cfg file instead of hardwiring default ones in the code (currently landfrac, lat, lon). To clarify, these are the variables that get read from the landmask_file. Also obtain the 1d lats/lons from the 2d rather than reading from the file.
This suggestion is motivated by a use-case that appears in #1701
@slevisconsulting I'm having trouble getting a sample landmask file in the right format. I thought it needed to be a history file, but it looks like the format is slightly different. Do you have a sample file in the correct format?
Also this makes me realize the importance of being able to set the file variable and dimension names in the config file.
@slevisconsulting I'm having trouble getting a sample landmask file in the right format. I thought it needed to be a history file, but it looks like the format is slightly different. Do you have a sample file in the correct format?
Also this makes me realize the importance of being able to set the file variable and dimension names in the config file.
@ekluzek this is a sample file that works:
/glade/work/slevis/git/mksurfdata_toolchain/tools/modify_fsurdat/fill_indian_ocean/fill_indianocean_slevis.nc
I agree with your suggestion of increasing the flexibility of this tool, by not hardwiring the variable and dimension names.
CF convention has dimension and coordinate variable names the same, so really just need to allow alternate variable names and the dimension names will follow.
@slevisconsulting thanks for the sample file, that really helps. I'm still having trouble with my case, so I'd like to meet with you sometime next week to go over this.
Meeting with @ekluzek
Some of these notes apply to fsurdat_modifier.
Suggestion to modify the landmask file to include 3 variables:
- landfrac as is in my current file (not required anymore so back to 2 variables in the file)
- landmask same as landfrac but zeros and ones (no fractions) should be read by mesh_modifier; allow as int/float/double; values other than 0 and 1 should trigger an error
- landmask_diff which is the user's area of change (my current file calls this landmask) read by fsurdat_modifier; allow as int/float/double; values other than 0 and 1 should trigger an error
Shared optional error check by both tools that compares landmask_diff to quantity calculated on the fly by the tool:
- for error check both tools need to read both variables
- tell it variable and dimension names that you assume are same as in the file you're comparing against
- discourage user from using fsurdat (var PFTDATA_MASK not same as landmask) as the starting point
- domain file also becoming obsolete...
- so clm history from vanilla case is the way to go
When landmask file contains lsmlon, lsmlat as coordinate variables type double/float it fails because it expects integer indices. So change code to accept lsmlon, lsmlat as coordinate variables type double/float. It's a CF convention to require lsmlon, lsmlat as coordinate variables type double/float for regular grid files (can't require them for non-reg grid files). Don't allow type int.
In my current landmask files, I can ncrename the variables lat to lsmlat and lon to lsmlon.
@ekluzek as we discussed, I have placed this PR on hold (SEE UPDATE AT BOTTOM OF THIS POST): Alper's jupyter-based bathymetry tool allows interactive calculation/assignment of bathymetry. In the CSSI group we agreed that Alper would add the option of reading Isla’s modified landmask file to generate a mesh file. His tool will also perform any regridding between the landmask file’s grid and the ocean grid. This will likely cover the needs of the CSSI project.
@ekluzek if we decide that this PR is still relevant for use-cases not involving an ocean mesh and/or Alper's solution doesn't work out for whatever reason, then we can revisit this PR.
UPDATED JUNE 1, 2022: Isla Simpson would like me to complete this PR for use-cases not involving an ocean mesh.
Made these updates to the landmask input file:
- landmask (was landfrac before) but zeros and ones (no fractions) should be read by mesh_modifier
- landmask_diff which is the user's area of change (was landmask before) read by fsurdat_modifier
- ncrenamed the variables lat to lsmlat and lon to lsmlon
Next I'm proceeding with other suggestions listed in this earlier post.
Shared optional error check by both tools that compares landmask_diff to quantity calculated on the fly by the tool:
- for error check both tools need to read both variables
- tell it variable and dimension names that you assume are same as in the file you're comparing against
- discourage user from using fsurdat (var PFTDATA_MASK not same as landmask) as the starting point
- domain file also becoming obsolete...
- so clm history from vanilla case is the way to go
We agreed that the above error-check was optional, and I am leaning strongly against implementing it because it requires a clm history file that may not exist at the time the user wishes to use these tools. I realize that a user who's modifying fsurdat and/or meshes should also feel comfortable generating a vanilla-case history file. However, such a requirement changes the simplicity of this tool from easy-to-use to a-bit-complicated, I think. Regardless, I am open to hearing other opinions.
I'm considering a different consistency check that doesn't require a history file: In mesh_mask_modifier, I can check that landmask = 1 for all landmask_diff = 1. This will ensure that the variable used in mesh_mask_modifier (landmask) is consistent with the variable used in fsurdat_modifier (landmask_diff).
I'm still having trouble with my case, so I'd like to meet with you sometime next week to go over this.
@ekluzek will confirm that this use-case now works.
I'm still having trouble with my case, so I'd like to meet with you sometime next week to go over this.
@ekluzek will confirm that this use-case now works.
Ok, I think this is now ready other than the test that @ekluzek may perform.
TODO
Add comment in README or other documentation that
conda activate ctsm
is required for fsurdat_modifier to generate a correctly formatted netcdf file on izumi.
@slevisconsulting something new we are doing is to add commits that are just black reformatting to the top level file:
.git-blame-ignore-revs
Just add your hash for your black reformat commit to that file. That way you can do this in a clone and those commits will be ignored in git blame
git config blame.ignoreRevsFile .git-blame-ignore-revs
@slevisconsulting it looks like there isn't any testing for this new script. We should add some level of testing for it. You could add some unit testing, a test suite test, and/or a test/tools test for it. It also looks like the default config file won't actually work as it is, it would be good to have one that was actually functional.
- [x] Add testing for mesh_modifier
- [x] working config file or system test that creates one
OK I got my case setup appropriately (I think) it's in
/glade/work/erik/ctsm_worktrees/modify_meshes/tools/modify_mesh_mask
Using the crujra_mesh.cfg file.
It's giving an error that the landmask and landmask_diff don't match. But, my understanding is that landmask should be the new landmask that I want and landmask_diff should be zero everywhere the mask doesn't change, and 1 where the mask does change.
@slevisconsulting could you take a look at my case and let me know what I need to do to get it to work?
And I think possibly there is an error when you want to change the mask from 1 to 0 for a location, which means landmask_diff should be "1" because it's changing. It would work if you are changing the mask from 0 to 1.
@ekluzek I think I understand the confusion. I may also be confused myself, so let's try to figure this out:
- If I were to rename
landmask_difftolandmask_new_landthen it would be clearer that this variable represents land-filling. - Your intention with
half_landmask.ncappears to be to fill all ocean with land. Hence the 1s over ocean and the 0s over land. However, you will not accomplish that without changing the variablelandmask. In fact,landmask_diffis used by this tool ONLY for consistency checking and has no effect on the mesh file.
Let me know if I have misunderstood your intention with half_landmask.nc. If not, then I will go ahead and rename landmask_diff to landmask_new_land or landmask_landfill or whatever you think is most intuitive. Then I will also update the README and .cfg instructions to be clearer.
I'm marking edits to this post with "UPDATE"
I should clarify that to remove land, you must set landmask and landmask_diff to 0. So to summarize:
- UPDATE: Where you're adding and/or modifying the land (eg, changing dom_pft or fmax or std_elev or soil_color etc), set landmask and landmask_diff to 1
- Where you're removing land, set landmask and landmask_diff to 0
- Where nothing changes, leave landmask unchanged and set landmask_diff to 0
UPDATE: Hence I now suggest that we rename landmask_diff to new_or_modified_land (or something similar).
@slevisconsulting I looked at my case to figure out what's going on. And what's going on is the starting mesh file has mask==1 everywhere, and I want to have a land/mask with ocean, but with Antarctica as ocean as well. So it's removing land over ocean and Antarctica, and as such nothing is being filled in.
I copied landmask to landmask_diff and I got it to go further, although then it complained about a problem with latitude
assert isclose(lat_new, lat_mesh, abs_tol=1e-5), errmsg
AssertionError: Must be equal: lat_new = 0.0 lat_mesh = -89.75 (at ncount = 0)
It would be good to discuss landmask_diff further. We should make sure it has a useful purpose. I think it would be useful to either be a mask of the area's where the mask is changing from the original grid (either being filled in or removed) or as the difference with the original mask (so it could be -1, 0, or +1). Or another approach would be for it to be the original mask (which should agree with the mask on the mesh file). You could end up putting the wrong landmask file with your mesh file, so there is utility in reproducing the original mask, and making sure they agree in the ways you expect them to.
I'm not sure I understand what landmask_diff is currently trying to function as, but there is an important utility in making sure that your landmask and mesh files agree with each other. We should perhaps discuss this?
I copied landmask to landmask_diff and I got it to go further, although then it complained about a problem with latitude
assert isclose(lat_new, lat_mesh, abs_tol=1e-5), errmsg AssertionError: Must be equal: lat_new = 0.0 lat_mesh = -89.75 (at ncount = 0)
This one I solved easily. Your lat_varname/lon_varname variables were indices rather than latitude/longitude values. I changed them like this in the .cfg file:
lat_varname = lat
lon_varname = lon
I also added these new lines in the .cfg file:
lat_dimname = lsmlat
lon_dimname = lsmlon
to permit different variable and dimension names (as is the case in your file).
However, now I get a similar error but for the longitude:
assert isclose(lon_new, lon_mesh, abs_tol=1e-5), errmsg
AssertionError: Must be equal: lon_new = 180.25 lon_mesh = 0.25 (at ncount = 0)
This one I will have to pursue tomorrow.
Let's definitely discuss all this when you're available. Thanks, @ekluzek!
[...] now I get a similar error but for the longitude:
assert isclose(lon_new, lon_mesh, abs_tol=1e-5), errmsg AssertionError: Must be equal: lon_new = 180.25 lon_mesh = 0.25 (at ncount = 0)
This one is more complicated. The calls on each longitude value
lon_new = lon_range_0_to_360(lon_new)
lon_mesh = lon_range_0_to_360(lon_mesh)
...don't help because the code expects the whole grid in the mesh file to line up identically to the grid in the landmask file. I will try to come up with a solution that doesn't require the user to recreate their landmask file.
...the code expects the whole grid in the mesh file to line up identically to the grid in the landmask file. I will try to come up with a solution that doesn't require the user to recreate their landmask file.
@ekluzek
with my latest updates, this tool can work with a landmask.nc file containing longitude values from -180 to 180 such as the one that you were working with. I have now generated the modified mesh file that you need, here:
/glade/work/slevis/git/mksurfdata_toolchain/tools/modify_mesh_mask/crujra_case/clmforc.cruncep.V7.c2016.0.5d.ESMFmesh_c220421.nc
I'm not sure I understand what landmask_diff is currently trying to function as, but there is an important utility in making sure that your landmask and mesh files agree with each other. We should perhaps discuss this?
I'm open to discussing this further and here's an update.
It makes sense that the name "landmask_diff" was a source of confusion, and I apologize for that. In the most recent commit I renamed it to the more descriptive "mod_lnd_props" (i.e. modify lnd properties):
- mod_lnd_props is used by the modify_fsurdat tool.
- mod_lnd_props = 1 where the user wishes to modify certain land properties in the fsurdat file and 0 elsewhere.
- mod_lnd_props does not contain information about adding or removing land grid cells. That's not helpful information for the tool's purposes.
- mod_lnd_props is not used by the modify_mesh tool except in some consistency checks.
The variable used by the modify_mesh tool is called landmask:
- landmask = 1 where the user wants land and 0 elsewhere.
- landmask needs to be on exactly the same grid as the input mesh file that it's modifying (the tool checks for that).
- landmask does not need to know whether it differs from the mask in the input mesh file. That's not helpful information for the tool's purposes.
- landmask is not used by the modify_fsurdat tool.