MetPy icon indicating copy to clipboard operation
MetPy copied to clipboard

Improve discussion of array types in "Working With Units" tutorial

Open jthielen opened this issue 4 years ago • 6 comments

What can be better?

The apparent/user-facing failure to attach units to an array by typically-shown "multiplication on the right" has been a long-standing issue (almost always with masked arrays) which frequently arises in support questions (e.g., https://mobile.twitter.com/Nolan_Meister/status/1455629079110209553) and off-handed gripes I've heard about MetPy. While the masked array issue in particular continues to await resolution of https://github.com/numpy/numpy/issues/15200, I think the crux of the issue is teaching users to know what kind of array they are working with, and then providing very clear direction on how to treat units in combination with that array type.

Here's what I'd have in mind (but please do suggest improvements/alternatives!):

  1. In "Getting Started," change "In general, using units is only a small step on top of using the numpy.ndarray object." to something like "It's important to know what kind of array you are using units with. Luckily, for each kind of array, using units is just a small step on top of using that object!"

  2. In the next section "Adding Units to Data," replace the first two snippets "The easiest way..." and "It is also possible..." with a table like the following:

How should I add units to my data? That depends on your data type!

Data Type Recommended Approach Example
Scalars (like floats and integers), numpy.ndarray, Sparse.COO, Dask.Array Construct Quantity object
or multiply by unit
units.Quantity(array, 'hPa') or array * units.hPa
xarray.DataArray with a 'units' attribute 1) If using only as function input, just use as-is
2) If doing operations outside of MetPy calculations, call .metpy.quantify() after subsetting
1) mpcalc.geostrophic_wind(ds['geopotential_height_isobaric'])
2) temperature = ds['temperature'].metpy.quantify()
xarray.DataArray without a 'units' attribute Multiply by unit temperature = ds['temperature'] * units.K
numpy.MaskedArray (like those from netcdf4-python) Construct Quantity object units.Quantity(array, 'hPa')

Curious about these differences? Take a look at "{INSERT TITLE HERE}" below to learn more.

  1. Change the "When using masked arrays..." in "Common Mistakes" to recommend the Quantity constructor and point to the table in (2).

  2. Add a comment in the "Common Mistakes" section to the effect of: "Do you still have a units problem that isn't addressed here? Reach out to support {...}. Unit arrays involve lots of different libraries interacting with each other, so we want to be able to help sort out any issues when those libraries aren't cooperating as expected."

  3. Add a brief explainer after "Common Mistakes" about why there are different ways to add units for those that are curious (e.g., a much less technical/more applied version of https://pint.readthedocs.io/en/stable/numpy.html#Technical-Commentary)

jthielen avatar Nov 02 '21 23:11 jthielen

Should there be a row for xarray.Dataset in point 2?

sgdecker avatar Nov 08 '21 15:11 sgdecker

Should there be a row for xarray.Dataset in point 2?

No, as an xarray.Dataset is a collection of arrays, rather than an array type itself. However, perhaps a comment to that effect (something along the lines of "an xarray.DataArray can be treated independently or as part of an xarray.Dataset") may help reduce confusion?

jthielen avatar Nov 08 '21 15:11 jthielen

Yes, a comment works. Anything that alerts the reader that metpy.quantify can be applied to an entire Dataset and not necessarily individual DataArrays one at a time (which is what the examples given might imply if interpreted a certain way).

sgdecker avatar Nov 08 '21 15:11 sgdecker

Not sure if this falls here or should be it's own issue, but https://github.com/Unidata/MetPy/discussions/2333 made me notice that the following error message in check_units is incorrect:

https://github.com/Unidata/MetPy/blob/94ef0e0895440b96853073004abf4b779e1b6178/src/metpy/units.py#L262-L264

This approach to assigning units will not (and is never intended to) work for xarray data types. Should we fix this error message to point to an updated docs page with the above information, or provide the conditional information itself in the error message?

jthielen avatar Feb 04 '22 21:02 jthielen

I think that is relevant here and probably worth its own issue and fix unless we only want to fix it by tying it to this docs update. IMO providing helpful (and valid) errors conditionally is a good direction for users here and in spirit of the feedback intended by this error. Still supported by the docs enhancements here though, of course.

dcamron avatar Feb 04 '22 21:02 dcamron

Perhaps a discussion of adding units to coordinate arrays a la: https://github.com/Unidata/MetPy/issues/2118#issuecomment-924165240 would be a useful addition as well.

sgdecker avatar Dec 07 '22 20:12 sgdecker