pygmt icon indicating copy to clipboard operation
pygmt copied to clipboard

Unable to use dataarray if its a variable of a dataset

Open mdtanker opened this issue 3 years ago • 3 comments

Description of the problem

If a dataarray has been converted to a DataSet with xarray.DataArray.to_dataset(), calling the dataarray in PyGMT modules fails. See below that output of grid and dataset.height are equivalent.

Full code that generated the error

import pygmt
import xarray as xr

region = [-28, -10, 62, 68]

grid = pygmt.datasets.load_earth_relief(resolution="03m", region=region)

dataset = grid.to_dataset(name='height')

print(pygmt.grdinfo(grid))
print(pygmt.grdinfo(dataset.height))

Full error message

grdcut [NOTICE]: Remote data courtesy of GMT data server oceania [http://oceania.generic-mapping-tools.org]
grdcut [NOTICE]: SRTM15 Earth Relief at 20x20 arc minutes reduced by Gaussian Cartesian filtering (37.1 km fullwidth) [Tozer et al., 2019].
grdcut [NOTICE]:   -> Download grid file [831K]: earth_relief_20m_p.grd
: Title: 
: Command: 
: Remark: 
: Pixel node registration used [Geographic grid]
: Grid file format: nf = GMT netCDF format (32-bit float), CF-1.7
: x_min: -28 x_max: -10 x_inc: 0.333333333333 (20 min) name: x n_columns: 54
: y_min: 62 y_max: 68 y_inc: 0.333333333333 (20 min) name: y n_rows: 18
: v_min: -2240 v_max: 1664 name: z
: scale_factor: 1 add_offset: 0
: format: classic
: Default CPT: 

grdinfo [ERROR]: File /tmp/pygmt-yyhpy7cb.nc was not found
grdinfo [ERROR]: Cannot find file /tmp/pygmt-yyhpy7cb.nc
grdinfo [ERROR]: Must specify one or more input files
Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
GMTCLibError                              Traceback (most recent call last)
Untitled-1.ipynb Cell 1' in <cell line: 11>()
      8 dataset = grid.to_dataset(name='height')
     10 print(pygmt.grdinfo(grid))
---> 11 print(pygmt.grdinfo(dataset.height))

File /home/tankerma/miniconda/envs/grav_inv2/lib/python3.10/site-packages/pygmt/helpers/decorators.py:585, in use_alias.<locals>.alias_decorator.<locals>.new_module(*args, **kwargs)
    580         msg = (
    581             f"Short-form parameter ({short_param}) is not recommended. "
    582             f"Use long-form parameter '{long_alias}' instead."
    583         )
    584         warnings.warn(msg, category=SyntaxWarning, stacklevel=2)
--> 585 return module_func(*args, **kwargs)

File /home/tankerma/miniconda/envs/grav_inv2/lib/python3.10/site-packages/pygmt/helpers/decorators.py:725, in kwargs_to_strings.<locals>.converter.<locals>.new_module(*args, **kwargs)
    723             kwargs[arg] = separators[fmt].join(f"{item}" for item in value)
    724 # Execute the original function and return its output
--> 725 return module_func(*args, **kwargs)

File /home/tankerma/miniconda/envs/grav_inv2/lib/python3.10/site-packages/pygmt/src/grdinfo.py:116, in grdinfo(grid, **kwargs)
    114 with Session() as lib:
    115     file_context = lib.virtualfile_from_data(check_kind="raster", data=grid)
--> 116     with file_context as infile:
    117         lib.call_module(
...

GMTCLibError: Module 'grdinfo' failed with status code 72:
grdinfo [ERROR]: File /tmp/pygmt-yyhpy7cb.nc was not found
grdinfo [ERROR]: Cannot find file /tmp/pygmt-yyhpy7cb.nc
grdinfo [ERROR]: Must specify one or more input files

Output of running grid

xarray.DataArray'elevation'lat: 18lon: 54
-1.462e+03 -1.444e+03 -1.423e+03 ... -1.864e+03 -1.887e+03 -1.675e+03
Coordinates:
lon
(lon)
float64
-27.83 -27.5 ... -10.5 -10.17
lat
(lat)
float64
62.17 62.5 62.83 ... 67.5 67.83
Attributes:
long_name :
elevation relative to the geoid
cpt :
geo
units :
meters
vertical_datum :
EMG96
horizontal_datum :
WGS84

Output of running dataset.height

xarray.DataArray'height'lat: 18lon: 54
-1.462e+03 -1.444e+03 -1.423e+03 ... -1.864e+03 -1.887e+03 -1.675e+03
Coordinates:
lon
(lon)
float64
-27.83 -27.5 ... -10.5 -10.17
lat
(lat)
float64
62.17 62.5 62.83 ... 67.5 67.83
Attributes:
long_name :
elevation relative to the geoid
cpt :
geo
units :
meters
vertical_datum :
EMG96
horizontal_datum :
WGS84

System information

Please paste the output of python -c "import pygmt; pygmt.show_versions()":

PyGMT information:
  version: v0.6.1
System information:
  python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0]
  executable: /home/tankerma/miniconda/envs/grav_inv2/bin/python
  machine: Linux-5.4.0-96-generic-x86_64-with-glibc2.31
Dependency information:
  numpy: 1.21.6
  pandas: 1.4.2
  xarray: 2022.3.0
  netCDF4: 1.5.8
  packaging: 21.3
  geopandas: 0.10.2
  ghostscript: 9.54.0
  gmt: 6.4.0
GMT library information:
  binary dir: /home/tankerma/miniconda/envs/grav_inv2/bin
  cores: 112
  grid layout: rows
  library path: /home/tankerma/miniconda/envs/grav_inv2/lib/libgmt.so
  padding: 2
  plugin dir: /home/tankerma/miniconda/envs/grav_inv2/lib/gmt/plugins
  share dir: /home/tankerma/miniconda/envs/grav_inv2/share/gmt
  version: 6.4.0

mdtanker avatar Jun 29 '22 00:06 mdtanker

I can confirm the issue but need some time to see why it doesn't work.

seisman avatar Jul 05 '22 13:07 seisman

Okay great, thanks for looking into it :smile:

mdtanker avatar Jul 05 '22 20:07 mdtanker

The issue is related to https://github.com/GenericMappingTools/pygmt/issues/524.

seisman avatar Jul 16 '22 00:07 seisman

Here is another example to reproduce the issue after PR https://github.com/GenericMappingTools/pygmt/pull/2009 is merged:

from pygmt.datasets import load_earth_relief

grid = load_earth_relief(
    resolution="01d", region=[0, 5, -5, 5], registration="pixel"
)
# check the original grid
assert len(grid.encoding["source"]) > 0
assert grid.gmt.registration == 1
assert grid.gmt.gtype == 1

# generate a new dataset
dataset = grid.to_dataset(name="height")
# source file is given but not found
assert len(dataset.height.encoding["source"]) > 0
assert not Path(dataset.height.encoding["source"]).exists()
# fallback to default registration and gtype
assert dataset.height.gmt.registration == 0
assert dataset.height.gmt.gtype == 0

# manually set the registration and gtype
dataset.height.gmt.registration = 1
dataset.height.gmt.gtype = 1
# the registration and gtype still have default values.
# Quote from https://docs.xarray.dev/en/stable/internals/extending-xarray.html
# > New instances, like those created from arithmetic operations or when
# > accessing a DataArray from a Dataset (ex. ds[var_name]), will have
# > new accessors created.
assert dataset.height.gmt.registration == 0
assert dataset.height.gmt.gtype == 0

seisman avatar Feb 16 '23 13:02 seisman

Some discussions at https://github.com/pydata/xarray/issues/3205 and https://github.com/pydata/xarray/issues/3268.

seisman avatar Feb 17 '23 02:02 seisman

This issue should be fixed by https://github.com/GenericMappingTools/pygmt/pull/2009, but please note that the registration and gtype information will be lost after converting to Dataset. See https://github.com/GenericMappingTools/pygmt/issues/499 for context and https://github.com/GenericMappingTools/pygmt/pull/2375 for the known limitations.

seisman avatar Feb 22 '23 13:02 seisman

Great, thanks! I look forward to using it in v0.9 :)

mdtanker avatar Mar 22 '23 21:03 mdtanker