A suggested solution to the `TypeError: Invalid value for attr:` error upon `.to_netcdf`
Even though netcdf conventions don't allow some data types in the attributes, it might be usefull to simply serialize those as strings rather than throw an error. Maybe add a force_serialization keyword argument to the .to_netcdf method.
Example: Setup a DataArray with bad values in the attributes:
import numpy as np
from xarray import DataArray, load_dataset
from pandas import Timestamp
from numbers import Number
valid_types = (str, Number, np.ndarray, np.number, list, tuple)
da = DataArray(
name='bad_values',
attrs=dict(
bool_value=True,
none_value=None,
datetime_value=Timestamp.now()
)
)
ds = da.to_dataset()
ds.bad_values.attrs
Output:
{'bool_value': True,
'none_value': None,
'datetime_value': Timestamp('2020-02-03 10:53:02.350105')}
The code in the except clause can be easily impolemented under _validate_attrs.
try:
ds.to_netcdf('test.nc')
# Fails with TypeError: Invalid value for attr: ...
except TypeError as e:
print(e.__class__.__name__, e)
for variable in ds.variables.values():
for k, v in variable.attrs.items():
if not isinstance(v, valid_types) or isinstance(v, bool):
variable.attrs[k] = str(v)
ds.to_netcdf('test.nc') # Works as expected
ds_from_file = load_dataset('test.nc')
ds_from_file.bad_values.attrs
Output:
TypeError Invalid value for attr: None must be a number, a string, an ndarray or a list/tuple of numbers/strings for serialization to netCDF files
{'bool_value': 'True',
'none_value': 'None',
'datetime_value': '2020-02-03 10:43:38.479866'}
Thanks for the suggestion. One issue here is that it's not round-trippable; i.e. it wouldn't get deserialized into an object on being loaded.
To the extent people don't think that's an issue, we could take a PR.
This would be a handy feature. Especially for writing unit tests, where often it's okay if the attributes aren't reserialized exactly.
would be nice better to wrap it and make it round trippable (especially for simple things like None )