Improve discussion of array types in "Working With Units" tutorial
What can be better?
The apparent/user-facing failure to attach units to an array by typically-shown "multiplication on the right" has been a long-standing issue (almost always with masked arrays) which frequently arises in support questions (e.g., https://mobile.twitter.com/Nolan_Meister/status/1455629079110209553) and off-handed gripes I've heard about MetPy. While the masked array issue in particular continues to await resolution of https://github.com/numpy/numpy/issues/15200, I think the crux of the issue is teaching users to know what kind of array they are working with, and then providing very clear direction on how to treat units in combination with that array type.
Here's what I'd have in mind (but please do suggest improvements/alternatives!):
-
In "Getting Started," change "In general, using units is only a small step on top of using the numpy.ndarray object." to something like "It's important to know what kind of array you are using units with. Luckily, for each kind of array, using units is just a small step on top of using that object!"
-
In the next section "Adding Units to Data," replace the first two snippets "The easiest way..." and "It is also possible..." with a table like the following:
How should I add units to my data? That depends on your data type!
| Data Type | Recommended Approach | Example |
|---|---|---|
Scalars (like floats and integers), numpy.ndarray, Sparse.COO, Dask.Array |
Construct Quantity object or multiply by unit |
units.Quantity(array, 'hPa') or array * units.hPa |
xarray.DataArray with a 'units' attribute |
1) If using only as function input, just use as-is 2) If doing operations outside of MetPy calculations, call .metpy.quantify() after subsetting |
1) mpcalc.geostrophic_wind(ds['geopotential_height_isobaric']) 2) temperature = ds['temperature'].metpy.quantify() |
xarray.DataArray without a 'units' attribute |
Multiply by unit | temperature = ds['temperature'] * units.K |
numpy.MaskedArray (like those from netcdf4-python) |
Construct Quantity object |
units.Quantity(array, 'hPa') |
Curious about these differences? Take a look at "{INSERT TITLE HERE}" below to learn more.
-
Change the "When using masked arrays..." in "Common Mistakes" to recommend the
Quantityconstructor and point to the table in (2). -
Add a comment in the "Common Mistakes" section to the effect of: "Do you still have a units problem that isn't addressed here? Reach out to support {...}. Unit arrays involve lots of different libraries interacting with each other, so we want to be able to help sort out any issues when those libraries aren't cooperating as expected."
-
Add a brief explainer after "Common Mistakes" about why there are different ways to add units for those that are curious (e.g., a much less technical/more applied version of https://pint.readthedocs.io/en/stable/numpy.html#Technical-Commentary)
Should there be a row for xarray.Dataset in point 2?
Should there be a row for
xarray.Datasetin point 2?
No, as an xarray.Dataset is a collection of arrays, rather than an array type itself. However, perhaps a comment to that effect (something along the lines of "an xarray.DataArray can be treated independently or as part of an xarray.Dataset") may help reduce confusion?
Yes, a comment works. Anything that alerts the reader that metpy.quantify can be applied to an entire Dataset and not necessarily individual DataArrays one at a time (which is what the examples given might imply if interpreted a certain way).
Not sure if this falls here or should be it's own issue, but https://github.com/Unidata/MetPy/discussions/2333 made me notice that the following error message in check_units is incorrect:
https://github.com/Unidata/MetPy/blob/94ef0e0895440b96853073004abf4b779e1b6178/src/metpy/units.py#L262-L264
This approach to assigning units will not (and is never intended to) work for xarray data types. Should we fix this error message to point to an updated docs page with the above information, or provide the conditional information itself in the error message?
I think that is relevant here and probably worth its own issue and fix unless we only want to fix it by tying it to this docs update. IMO providing helpful (and valid) errors conditionally is a good direction for users here and in spirit of the feedback intended by this error. Still supported by the docs enhancements here though, of course.
Perhaps a discussion of adding units to coordinate arrays a la: https://github.com/Unidata/MetPy/issues/2118#issuecomment-924165240 would be a useful addition as well.