iris icon indicating copy to clipboard operation
iris copied to clipboard

Reduce in-place cube operations.

Open pp-mo opened this issue 6 years ago • 5 comments

Like numpy, we now only have a very few "general" operations in the cube API that modify an existing cube rather than returning a new one. For comprehensibilty, I think those should be reduced to an absolute minimum. Whereas things like add_dim_coord, and .data = clearly must be in-place operations, other operations like transpose and rename are not.

The following are currently in-place, and perhaps could usefully not be (IMHO) :

  • transpose
  • rename
  • convert_units

FWIW, I think 'transpose' is especially undesirable and unexpected.

Ideally I would both remove in-place operations wherever possible, and make it all as clear as possible by naming, e.g. cube.transposed((2,0,1)), cube.regridded(gribcube).

  • so e.g. that's a vote for renaming regrid to regridded !

Like #3429, this has clear parallels in other processing libraries, especially numpy. Numpy has very few in-place operations, which makes processing code very much clearer, but their naming is not at all consistent.

pp-mo avatar Oct 01 '19 10:10 pp-mo

Note: milestone set to v3.0.0, as it's breaking change, but this is probably not practical

pp-mo avatar Oct 01 '19 10:10 pp-mo

Just stumbled over an old issue : I already noted this one ! #2615 "cube.transpose is an in-place operation"

Notably there, the absence of cube data sharing was seen as an obstacle, for performance reasons. See #2549 + its followers

pp-mo avatar Oct 01 '19 11:10 pp-mo

In my mind, rename is just a setter for what comes out of name. So maybe it would make more sense to formalise that and make name a property. So instead of cube.rename("thing") we just do cube.name = "thing". I can’t think of a reason to want two cubes that are identical apart from their names.

Note that there are equivalent rename and convert_units methods on coordinates. In the coordinate case, I think in place operations make practical sense. Otherwise something as simple as

cube.coord("time").convert_units("days since 1970-01-01")

has to become

new_coord = cube.coord("time").convert_units("days since 1970-01-01")
dims = cube.coord_dims("time")
cube.remove_coord("time")
cube.add_dim_coord(new_coord, dims)

Adding lines never feels like a win!

Making the cube methods work inconsistently from their equivalent coord methods is not going to help with the comprehensibility though.

rcomer avatar Oct 02 '19 07:10 rcomer

In the coordinate case, I think in place operations make practical sense.

Good spot + a very telling point. Hmmm.... :thinking:

pp-mo avatar Oct 02 '19 22:10 pp-mo

@pp-mo Whilst we're in this space, it would be a major step forward IMHO if we had a clear statement and enforced strategy for immutable cubes, coords et al, and zero-copy views. For me this all goes hand-in-hand with purging all in-place operations. Just sayin'...

bjlittle avatar Oct 24 '19 11:10 bjlittle