iris icon indicating copy to clipboard operation
iris copied to clipboard

Regridding with `iris.analysis.UnstructuredNearest` does not preserve `dtype` and masks

Open schlunma opened this issue 4 years ago • 10 comments

🐛 Bug Report

Hi guys, I found two issues with regridding using iris.analysis.UnstructuredNearest:

  1. The data type of the input cube is not preserved, the output type is always float64.
  2. The output is not masked, i.e., the output data is a regular array (and not a masked array) that contains values of the fill_value.

How To Reproduce

Minimal working example using this dataset (this is an nc file, but GitHub does not allow nc files): test.nc.txt

Test dataset
netcdf test.nc {
dimensions:
        time = 1 ;
        dim1 = 10 ;
        bnds_3 = 3 ;
variables:
        float tas(time, dim1) ;
                tas:long_name = "temperature in 2m" ;
                tas:units = "K" ;
                tas:coordinates = "clat clon" ;
        double time(time) ;
                time:standard_name = "time" ;
                time:invalid_units = "day as %Y%m%d.%f" ;
        float clat(dim1) ;
                clat:bounds = "clat_bnds" ;
                clat:units = "degrees_north" ;
                clat:standard_name = "latitude" ;
                clat:long_name = "center latitude" ;
        float clat_bnds(dim1, bnds_3) ;
        float clon(dim1) ;
                clon:bounds = "clon_bnds" ;
                clon:units = "degrees_east" ;
                clon:standard_name = "longitude" ;
                clon:long_name = "center longitude" ;
        float clon_bnds(dim1, bnds_3) ;

// global attributes:
                :CDI = "Climate Data Interface version 1.8.3rc (http://mpimet.mpg.de/cdi)" ;
                :CDI_grid_type = "unstructured" ;
                :uuidOfHGrid = "e941b1d0-ab58-11e8-ba4f-bdd1e82b9e6d" ;
                :Conventions = "CF-1.7" ;
data:

 tas =
  _, 293.6598, 292.4131, _, _, _, _, _, 291.7204, 291.7908 ;

 time = 20051001 ;

 clat = 52.61851, 52.93201, 52.93201, 51.98309, 53.88348, 53.56083, 54.50582, 
    53.56083, 51.97181, 52.60703 ;

 clat_bnds =
  53.25029, 52.29887, 52.29887,
  52.29887, 53.23846, 53.25029,
  53.25029, 53.23846, 52.29887,
  52.29887, 51.34985, 52.29887,
  54.19249, 54.19249, 53.25029,
  53.25029, 53.23846, 54.19249,
  54.19249, 55.13753, 54.19249,
  54.19249, 53.23846, 53.25029,
  52.27613, 51.33891, 52.29887,
  52.29887, 53.23846, 52.27613 ;

 clon = 73, 73.91156, 72.08844, 73, 73, 73.92289, 73, 72.07711, 71.22221, 
    71.20061 ;

 clon_bnds =
  73, 72.10562, 73.89438,
  73.89438, 74.83416, 73,
  73, 71.16584, 72.10562,
  72.10562, 73, 73.89438,
  73.94114, 72.05886, 73,
  73, 74.83416, 73.94114,
  73.94114, 73, 72.05886,
  72.05886, 71.16584, 73,
  70.31771, 71.25498, 72.10562,
  72.10562, 71.16584, 70.31771 ;
}
# Regrid unstructured grid

import numpy as np
import iris
import iris.coords
import iris.cube
from iris.analysis import UnstructuredNearest

cube = iris.load_cube('test.nc')

scheme = UnstructuredNearest()
target_lat = iris.coords.DimCoord([50.0, 52.0, 54.0], standard_name='latitude', units='degrees')
target_lon = iris.coords.DimCoord([70.0, 71.0, 72.0], standard_name='longitude', units='degrees')
target_grid = iris.cube.Cube(np.full((3, 3), 0.0), dim_coords_and_dims=[(target_lat, 0), (target_lon, 1)])

regridded_cube = cube.regrid(target_grid, scheme)

print("dtype of original cube:", cube.dtype)
print("dtype of regridded cube:", regridded_cube.dtype)
print("")

print("data:")
print(regridded_cube.data)
print("data is masked?", np.ma.is_masked(regridded_cube.data))

print("")
print("Original fill value:", cube.data.fill_value)

gives

dtype of original cube: float32
dtype of regridded cube: float64

data:
[[[2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [9.96920997e+36 9.96920997e+36 9.96920997e+36]]]
data is masked? False

Original fill value: 9.96921e+36

Expected behaviour

The dtype should be preserved, i.e., in the example both dtypes should be float32 and the data should be masked correctly, i.e., it should look like this:

[[[2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [2.91720398e+02 2.91720398e+02 2.91720398e+02]
  [      --             --             --      ]]]

Environment

  • OS & Version: openSUSE Tumbleweed
  • Iris Version: 3.1.0

I know that you are currently refactoring the support of unstructured grids. I'm not entirely sure if this also includes new regridders or not; if this is the case and you are in the middle of rewriting this regridder, please ignore this issue.

Thanks for your help :+1:

schlunma avatar Dec 16 '21 16:12 schlunma

Hey @schlunma,

Great to hear from you and thanks for taking the time to raise this issue 👍

Note that, if you're interested in regridding with unstructured grids, I'd highly recommend that you take a look at https://github.com/SciTools-incubator/iris-esmf-regrid

We're actively developing this package in conjunction with our current UGRID work, and @stephenworsley (lead developer) would be super keen for feedback... He'd also be on hand to help you with using iris-esmf-regrid and introducing you to that world.

Note that, like all our regridders, it plums straight into iris using the regrid/interpolate scheme pattern... give it a whirl.

It's also available on both conda-forge and pypi 👍

bjlittle avatar Dec 17 '21 07:12 bjlittle

Thanks for the quick answer @bjlittle!

iris-esmf-regrid looks super interesting, I will definitely have a look at that. One question on this: is there any documentation available for it? I couldn't find it on the GitHub page. Thanks!!

schlunma avatar Dec 17 '21 08:12 schlunma

No worries, glad to help.

It's a great question, thanks for asking. The simple answer is no, not yet... It's that new! 😉

Although I'm sure @stephenworsley has plans in the pipeline. Why not create an issue on iris-esmf-regrid asking for some rudimentary docs. If you want it, so will others, so let's make sure that happens.

Perhaps @stephenworsley could provide you with some code examples in the meantime.

What do you think @stephenworsley ? What's the best way for @schlunma to spin up and start playing ? 🤔

bjlittle avatar Dec 17 '21 09:12 bjlittle

Rudimentary docs would be really useful for me too please and so I've created SciTools-incubator/iris-esmf-regrid#139

jonseddon avatar Dec 17 '21 10:12 jonseddon

Note that, at the moment iris-esmf-regrid only supports area weighted regridding from lat/lat to unstructured, and vice versa... Although @stephenworsley is more qualified to comment on that

bjlittle avatar Dec 17 '21 10:12 bjlittle

Thanks for opening the issue @jonseddon!

Note that, at the moment iris-esmf-regrid only supports area weighted regridding from lat/lat to unstructured, and vice versa... Although @stephenworsley is more qualified to comment on that

That sounds great for a start! I actually have a use case where area-weighted regridding from unstructured grids to lat/lon could come in handy :+1:

schlunma avatar Dec 17 '21 10:12 schlunma

So currently iris-esmf-regrid only handles area weighted regridding. This is not applicable in all cases since it requires bounds information to be present on the source and target cubes, though if you do have compatible cubes you should find it useful. With that said, ESMF itself does provide good support for nearest neighbour regridding and I think it would be worthwhile to eventually add a nearest neighbour regridder to iris-esmf-regrid. I imagine this may potentially be able to cover a wider range of types of nearest neighbour regridding than iris currently handles (e.g. distinguishing between nearest-source-to-destination and nearest-destination-to-source regridding) so feel free to add an issue to iris-esmf-regrid if you feel like you need this functionality.

stephenworsley avatar Dec 17 '21 10:12 stephenworsley

Also, I'm interested to know what you believe the correct handling of masks should be. Comments in the code suggest that the current handling is not correct: https://github.com/SciTools/iris/blob/3638874a63e9bbee72b062fe02a8e500982eb796/lib/iris/analysis/trajectory.py#L413-L415 Though I could think of different behaviours which could claim to be "more correct".

  1. Masking the output when the nearest input is masked
  2. Taking the value of the nearst unmasked input.

I would imagine that the first of these would be easier to implement, especially when additional dimensions are involved. It may also be more appropriate in certain cases, though I could imagine there may also be cases where the second case would be more appropriate. I'd have to do some more research into existing nearest neighbour regridders like ESMF to see which is "standard".

stephenworsley avatar Dec 17 '21 11:12 stephenworsley

Hmm...that's a tricky question. For me both options make sense, but I tend to agree more with option 1.

From what I can tell from the code (filling the data before interpolation) I guess you are kind of using option 1 at the moment without applying a mask afterwards but just leaving the fill values as is, right?

schlunma avatar Dec 17 '21 11:12 schlunma

That appears to be right as far as I can tell, yes.

stephenworsley avatar Dec 17 '21 11:12 stephenworsley