Is it possible to get a domain summary (aka bounding box) in a simpler way that the following?
def domain_summary(f):
""" For a given field and it's domain, get a domain summary """
#FIXME: should be outer edge of bounds, and I am not sure about the coords and ancils, is that sufficient?
coords = [serialise(cf.Data.concatenate([b.min(), b.max()])) for b in [a.data for k,a in f.domain.dimension_coordinates().items()]]
ancils = [serialise(cf.Data.concatenate([b.min(), b.max()])) for b in [a.data for k,a in f.domain.auxiliary_coordinates().items()]]
bbox = coords+ancils
methods = get_cell_methods(f)
return bbox, methods
In doing this I'm making use of the following stub for serialising a data object, e.g. to pass to another application using json (is there a better way to do this)?
def serialise_data(o):
units = str(o.Units)
values = o.array.squeeze()
shape = o.array.shape
if len(values) == 1:
values = float(values[0])
else:
values = values.tolist()
return {'units': units, 'values': values, 'shape': shape}
def serialise(object):
methods = {
"<class 'cf.data.data.Data'>": serialise_data
}
otype = str(type(object))
if otype in methods:
return methods[otype](object)
else:
return ValueError(f'Cannot serialise {otype}')
Hi Bryan, here's my take on it. The only functional differences are:
- It returns the bounds limits when they are set, otherwise the grid coordinates
- It randomly mingles the dimension and auxiliary coordinates in the output.I figured that this was OK because there is no way of discerning between them in the bbox, and it's a dictionary.
The rest of it is just personal style. All to be taken with a pinch of salt :)
import cf
import numpy as np
def serialise_data(o):
units = str(o[0].Units)
values = sorted(d.datum() for d in o)
if not np.diff(values).any():
# All the values are the same
values = values[0:1]
shape = (len(values),)
if len(values) == 1:
values = values[0]
return {"units": units, "values": values, "shape": shape}
def serialise(otype, o):
methods = {"data": serialise_data}
try:
return methods[otype](o)
except KeyError:
return ValueError(f"Cannot serialise {otype}")
def domain_summary(f):
"""For a given field and it's domain, get a domain summary"""
# FIXME: I am not sure about the coords and auxs, is that
# sufficient?
bbox = [
serialise("data", [a.lower_bounds.min(), a.upper_bounds.max()])
for a in f.coordinates().values() if a.dtype.kind not in "SU"
]
methods = get_cell_methods(f)
return bbox, methods
Thanks David. A couple of questions:
- "Not sure about coords and auxs" ... is that a question for me? I am thinking we need to do the auxs to cover cases like we had in CANARI ... ???
- What is the
a.dtype.kind not in "SU" doing? - I presume if there are no bounds, it just uses half the cell width??
Hi Bryan,
- "Not sure about coords and auxs" ... is that a question for me? I am thinking we need to do the auxs to cover cases like we had in CANARI ... ???
Just transcribed from your original code, so didn't want to delete it. I suppose that if you were being complete you might want to include other items, such as coordinate reference system info, but that's a design choice, rather than a cf-python usage question.
- What is the a.dtype.kind not in "SU" doing?
Belt and braces: Just stopping any string-valued coordinates mucking things up.
- I presume if there are no bounds, it just uses half the cell width??
No bounds means zero cell size. it's easy to create bounds (that are by default Voronoi, but can be all sorts of other wonderful things):
In [1]: import cf; f = cf.example_field(0)
In [2]: x = f.coord('X')
In [3]: x.del_bounds()
Out[3]: <CF Bounds: longitude(8, 2) degrees_east>
In [4]: x.cellsize
Out[4]: <CF Data(8): [0.0, ..., 0.0] degrees_east>
In [5]: x.set_bounds(x.create_bounds())
In [6]: x.cellsize
Out[6]: <CF Data(8): [45.0, ..., 45.0] degrees_east>
(Thanks David, intuitively I would expect cellsize to be a property of domain, not the parent field.)
I think it seems one should set bounds if they don't exist first, I will "show" some code for that shortly, but meanwhile, I am intrigued by the changes you made to the serialise routine. What motivated those changes?
Also, I find this behaviour very strange (for the same example field):
t = f.coord('time')
t.lower_bounds.min()
Out[24]: <CF Data(1): [2019-01-01 00:00:00]>
b = t.get_bounds()
... stack dump ...
ValueError: DimensionCoordinate has no bounds
(Thanks David, intuitively I would expect cellsize to be a property of domain, not the parent field.)
Currently the cellsize attribute is on a coordinate (or domain ancillary) construct, rather than a field or domain construct.
I could implement a cellsize() method on a field and domain object that does the same thing. Ie. all four of these would return the same result:
>>> f.coord('X').cellsize
>>> f.cellsize('X')
>>> f.domain('X').cellsize
>>> f.domain.cellsize('X')
What motivated those changes?
I'll have to remind myself ...!
I could implement a
cellsize()method on a field and domain object that does the same thing. Ie. all four of these would return the same result:>>> f.coord('X').cellsize >>> f.cellsize('X') >>> f.domain('X').cellsize >>> f.domain.cellsize('X')
I suspect it would be more useful to support f.domain.cellsize and get cell bounding boxes back ... to discuss, because that's rather like areacellx ...
What motivated those changes?
Just personal preference, so you can take it or leave it! I find it generally not so robust to test on type as it prevents the ability to pass in other objects which duck type (such as, in this case, any cf-python object that itself contains data).
My initial go at implementing a slightly more general version:
class DomainSummary:
"""
Holds a bounding box view of a domain, supplemented by cell_methods
See https://github.com/NCAS-CMS/cf-python/issues/482
"""
def __init__(self, f):
"""
Initialise with a field for which you wish to obtain the domain summary
"""
# I am not sure about the coords and ancils, is that sufficient?
coords = [a for a in f.coordinates().values() if a.dtype.kind not in "SU"]
for a in coords:
try:
aa = a.get_bounds()
except ValueError:
if len(a.data) > 1:
a.set_bounds(a.create_bounds())
self.bbox = [cf.Data.concatenate([a.lower_bounds.min(), a.upper_bounds.max()]) for a in coords]
self.methods = get_cell_methods(f)
def __str__(self):
return self.json
def __eq__(self, other):
return str(other) == str(self)
@property
def json(self):
return json.dumps({'bbox':[serialise('cfdata',b) for b in self.bbox], 'methods':self.methods})
class TestDomainSummary(unittest.TestCase):
def test_bounded(self):
""" Test domain summary works with well behaved bounded example"""
f = cf.example_field(0)
fds = DomainSummary(f)
def test_unbounded(self):
""" Test domain summary works with no bounds on a coordinate """
f = cf.example_field(0)
v1 = DomainSummary(f)
x = f.coord('X')
x.del_bounds()
v2 = DomainSummary(f)
self.assertEqual(v1,v2)
if __name__=="__main__":
unittest.main()