iris icon indicating copy to clipboard operation
iris copied to clipboard

Fix iris handling of netcdf character array variables

Open pp-mo opened this issue 1 year ago • 4 comments

Created an umbrella issue for this because it has come up several times in different contexts, but we never got around to it.

  • https://github.com/SciTools/iris/issues/4101
  • https://github.com/SciTools/iris/issues/4412
  • https://github.com/SciTools/iris/issues/5362
  • https://github.com/SciTools/iris/issues/4502 (this one is a minor issue)

See also : https://github.com/pp-mo/ncdata/issues/111

From what I've seen, the current situation is that "standard" CF-approved character-array variables ...

  • don't read correctly, i.e. as character arrays with a string dimension), but instead as string (object) arrays
  • ... and then don't save correctly, i.e. back to character arrays, but instead as variable-length netcdf "string" type
  • ... which, of course, at present won't read back at all -- see #6149

( This has recently re-emerged in a query on ncdata https://github.com/pp-mo/ncdata/issues/111

From @znichollscr -- possibly not the same as @znicholls ??? )

pp-mo avatar Feb 07 '25 12:02 pp-mo

From @znichollscr -- possibly not the same as @znicholls ???

Sorry yes that's me. I have two employers so have to run two accounts 🙃

znicholls avatar Feb 07 '25 13:02 znicholls

Bumping this as we currently have to remove a string coordinate from ERA5 netcdf data in order to save it: (Although it does now at least load following https://github.com/SciTools/iris/issues/6149)

>>> iris.save(g, "test.nc")
/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py:2667: IrisDeprecation: Saving to netcdf with legacy-style attribute handling for backwards compatibility.
This mode is deprecated since Iris 3.8, and will eventually be removed.
Please consider enabling the new split-attributes handling mode, by setting 'iris.FUTURE.save_split_attrs = True'.
  warn_deprecated(message)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/io/__init__.py", line 476, in save
    result = saver(source, target, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 2749, in save
    sman.write(
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 617, in write
    self._add_aux_coords(cube, cf_var_cube, cube_dimensions)
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 964, in _add_aux_coords
    return self._add_inner_related_vars(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 914, in _add_inner_related_vars
    cf_name = self._create_generic_cf_array_var(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 1753, in _create_generic_cf_array_var
    new_data[index_slice] = list(
    ~~~~~~~~^^^^^^^^^^^^^
ValueError: could not broadcast input array from shape (79,) into shape (1,)
>>> g.remove_coord("expver")
>>> iris.save(g, "test.nc")
>>> exit()

I was hoping I could resolve this by using iris to convert from the native ecmwf grib to netcdf but that also fails (perhaps the ecmwf converter was introducing errors).

>>> iris.save(f, "era5_converted.nc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/io/__init__.py", line 476, in save
    result = saver(source, target, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 2749, in save
    sman.write(
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 647, in write
    self.update_global_attributes(global_attributes)
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 711, in update_global_attributes
    _setncattr(self._dataset, attr_name, attributes[attr_name])
  File "/home/users/daniel.cubbon/.conda/envs/ard-3/lib/python3.12/site-packages/iris/fileformats/netcdf/saver.py", line 271, in _setncattr
    return variable.setncattr(name, attribute)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 3087, in netCDF4._netCDF4.Dataset.setncattr
  File "src/netCDF4/_netCDF4.pyx", line 1888, in netCDF4._netCDF4._set_att
TypeError: illegal data type for attribute b'GRIB_PARAM', must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got O

Where the O is a <class 'iris_grib.grib_phenom_translation._gribcode.GRIBCode1'>

mo-DanCubbon avatar May 13 '25 09:05 mo-DanCubbon

Bumping this as we currently have to remove a string coordinate from ERA5 netcdf data in order to save it: (Although it does now at least load following #6149)

I was hoping I could resolve this by using iris to convert from the native ecmwf grib to netcdf but that also fails (perhaps the ecmwf converter was introducing errors).

TypeError: illegal data type for attribute b'GRIB_PARAM', must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got O

... Where the O is a <class 'iris_grib.grib_phenom_translation._gribcode.GRIBCode1'>

Actually that may be a different error. Since we added "GRIB_PARAM" for GRIB1 messages, this has caught some people out. Because Iris save code is not consistent in converting these to strings so it can save them to netcdf. Whereas e.g. "STASH" objects in attributes are seamlessly converted to+from netcdf (via string representations).
Which probably needs fixing . cf. https://github.com/SciTools/iris/issues/6286 https://github.com/SciTools/iris-grib/issues/596

For now, you could maybe convert GRIB_PARAM attributes to their string representations ?(this doesn't lose any information).

pp-mo avatar May 13 '25 11:05 pp-mo

Cross-reference : #5125 possibly relates

pp-mo avatar Oct 22 '25 23:10 pp-mo