rioxarray icon indicating copy to clipboard operation
rioxarray copied to clipboard

Set default of `mask_and_scale` to `True` in `open_rasterio`

Open gcaria opened this issue 3 years ago • 6 comments

In xarray.open_dataset, the optional arguments mask_and_scale defaults to True (doc), which is what I expect almost any user prefers.

I was surprised to find out that in rioxarray.open_rasterio the same argument defaults to False.

gcaria avatar Apr 02 '22 14:04 gcaria

It is that way because the mask_and_scale option doesn't exist in xarray.open_rasterio. Leaving it defaulted to False makes it less surprising for users who are transitioning to rioxarray.open_rasterio.

If the option is to change in the future, first a release of rioxarray with a runtime warning for users who are not changing mask_and_scale that the default behavior will change needs to be made.

snowman2 avatar Apr 04 '22 13:04 snowman2

Related: #69 & #281

snowman2 avatar Apr 04 '22 13:04 snowman2

Thanks for the explanation! It looks like there are two contrasting needs here:

  1. ease of use for users transitioning from xarray.open_rasterio
  2. ease of use for users that are used to xarray.open_dataset

I guess the name of the function warrants priority to 1), and actually this is more of a xarray issue (the fact that theopen_rasterio and open_dataset have opposite defaults for mask_and_scale).

Probably a more logical way of working for people in the 2) category (like myself) is to use xarray.open_dataset(..., engine='rasterio'), is mask_and_scale=True by default in this case?

gcaria avatar Apr 06 '22 10:04 gcaria

Probably a more logical way of working for people in the 2) category (like myself) is to use xarray.open_dataset(..., engine='rasterio'), ismask_and_scale=True by default in this case?

Yes, it is.

snowman2 avatar Apr 06 '22 12:04 snowman2

... not sure if this is the right place to mention this but since I just stumbled upon this issue while trying to avoid memory-overloads from unnecessary float-conversions while reading large GeoTIFFs, I thought I add a comment here.

It seems that at the moment using mask_and_scale=True always converts the data to float (even if add_offset=0. and scale_factor=1. )

import xarray as xar
with xar.open_dataset("path_to_GeoTIFF.tif") as raster:
    print(raster.band_data.dtype)  
    # >>> dtype('float32')
    print(raster.band_data.encoding)
    # >>> {'dtype': 'int16',
    # >>>  'scale_factor': 1.0,
    # >>>  'add_offset': 0.0,
    # >>>  '_FillValue': -9999.0,
    # >>>  'grid_mapping': 'spatial_ref',
    # >>>  'rasterio_dtype': 'int16'}

Is this intentional? If a dataset is actually not encoded, I'd prefer to avoid unnecessary float-conversion as much as possible...

raphaelquast avatar Apr 13 '22 17:04 raphaelquast

It seems that at the moment using mask_and_scale=True always converts the data to float (even if add_offset=0. and scale_factor=1. )

Yes, this is required. It is due to the mask part. xarray uses NaN to represent nodata when masked. This requires the dtype to be a float.

snowman2 avatar Apr 13 '22 18:04 snowman2