Add xarray field conversion / output handler
Original report (archived issue) by Keaton Burns (Bitbucket: kburns).
We should consider simple xarray conversion operators for fields, as well as xarray handlers that produce xarray objects and/or write to disk using xarray.
Original comment by Tomás Chor (Bitbucket: tomchor).
I’m new to Dedalus but I can say that this feature would be very much appreciated. HDF5 isn’t the most intuitive format. I primarily use xarray and it costs me to convert Dedalus' output to a format that xarray can understand (and I’m sure I’m doing a bad job at it!).
What would be the disadvantage of outputting NETCDF4, if I may ask? Wouldn’t any algorithm designed to read HDF5 also work if it’s formatted as NETCDF4?
Thanks!
Original comment by J. S. Oishi (Bitbucket: jsoishi).
Hi Tomas,
Thanks for your comment; it’s really helpful to hear from users! The primary disadvantage of outputting NETCDF4 is that it is used primarily by one subset of people who use Dedalus, a subset [geosciences] that none of the primary developers are experts in. Thus, we focused on creating a file format that works well for the diverse applications we use Dedalus on, maintaining a minimal set of dependencies (e.g. not having to have users install a NETCDF library as well as HDF5, which they would still need). Given the limited developer time, and the fact that our existing HDF5 standard seems to work (perhaps not ideally, as you have found!), we have not added NETCDF4.
As for your second question, no, any algorithm designed to read HDF5 would not also work for NETCDF4. The reason is that NETCDF4 is a subset of HDF5. Dedalus’s output files are a different subset of HDF5, with different conventions and different ways of storing data.
Unfortunately, any data file choice will involve technical friction on someone’s part. Thanks for your comment, we will probably be more likely to add the xarray feature now that we know at least one user would directly benefit from it!
@jsoishi it looks like we may be able to implement an xarray backend that lazily reads Dedalus HDF5 files: https://xarray.pydata.org/en/stable/internals/how-to-add-new-backend.html#rst-lazy-loading
It might also be worth revisiting the idea of making our outputs netCDF compatible, but I'm really not sure what that would require: https://www.unidata.ucar.edu/software/netcdf/
Looks like netcdf does not support complex numbers, so that's out.
It's ugly, but I've seen people do complex numbers with netCDF by basically introducing an additional axis of length 2 that corresponds to the real/imaginary parts.
I guess that means you would have to change things pretty dramatically for all the Dedalus users who don't use netCDF. Doesn't sound great.
A basic xarray interface has been added with #213. I'll close this for now, but would definitely be open to further discussions about enhancing this interface with lazy loading, etc., if anyone needs that and is interested in contributing.