Set default of `mask_and_scale` to `True` in `open_rasterio`
In xarray.open_dataset, the optional arguments mask_and_scale defaults to True (doc), which is what I expect almost any user prefers.
I was surprised to find out that in rioxarray.open_rasterio the same argument defaults to False.
It is that way because the mask_and_scale option doesn't exist in xarray.open_rasterio. Leaving it defaulted to False makes it less surprising for users who are transitioning to rioxarray.open_rasterio.
If the option is to change in the future, first a release of rioxarray with a runtime warning for users who are not changing mask_and_scale that the default behavior will change needs to be made.
Related: #69 & #281
Thanks for the explanation! It looks like there are two contrasting needs here:
- ease of use for users transitioning from
xarray.open_rasterio - ease of use for users that are used to
xarray.open_dataset
I guess the name of the function warrants priority to 1), and actually this is more of a xarray issue (the fact that theopen_rasterio and open_dataset have opposite defaults for mask_and_scale).
Probably a more logical way of working for people in the 2) category (like myself) is to use xarray.open_dataset(..., engine='rasterio'), is mask_and_scale=True by default in this case?
Probably a more logical way of working for people in the 2) category (like myself) is to use xarray.open_dataset(..., engine='rasterio'), ismask_and_scale=True by default in this case?
Yes, it is.
... not sure if this is the right place to mention this but since I just stumbled upon this issue while trying to avoid memory-overloads from unnecessary float-conversions while reading large GeoTIFFs, I thought I add a comment here.
It seems that at the moment using mask_and_scale=True always converts the data to float (even if add_offset=0. and scale_factor=1. )
import xarray as xar
with xar.open_dataset("path_to_GeoTIFF.tif") as raster:
print(raster.band_data.dtype)
# >>> dtype('float32')
print(raster.band_data.encoding)
# >>> {'dtype': 'int16',
# >>> 'scale_factor': 1.0,
# >>> 'add_offset': 0.0,
# >>> '_FillValue': -9999.0,
# >>> 'grid_mapping': 'spatial_ref',
# >>> 'rasterio_dtype': 'int16'}
Is this intentional? If a dataset is actually not encoded, I'd prefer to avoid unnecessary float-conversion as much as possible...
It seems that at the moment using mask_and_scale=True always converts the data to float (even if add_offset=0. and scale_factor=1. )
Yes, this is required. It is due to the mask part. xarray uses NaN to represent nodata when masked. This requires the dtype to be a float.