Add options for HDF5 history and profile output
History and profiles are the basic output types for all MESA models, and HDF5 options for these would enable storing data far more efficiently. I imagine many users have existing workflows that would be disrupted, so we would likely want to continue to support the current plain text output files as default, at least initially. We could start by hiding an HDF5 option for this output behind an inlist flag that defaults to false.
use_hdf5_for_output_data = .false.
We would also likely want to distribute lightweight python tooling analogous to py_mesa_reader so that users can easily load their MESA output into python analysis scripts for postprocessing. The goal should be to make this seamless enough that the average user running models and making plots with python doesn't have to think at all about the underlying HDF5 format of the data.
Hi Evan,
I'm currently trying my hand at implementing this on my own fork. Using hdf5 would be convenient for quite a few science use-cases, including mine. The inclusion of forum_m in r.24.08.1 retains the high-level hdf5io wrapper which is being used for other modules, so I will go ahead and use that tooling for writing history and profiles (i.e. no new dependencies should be necessary).
Also, as it pertains to python tooling--would it be worthwhile to encourage use of pandas vis-a-vis py_mesa_reader? Pandas has a built-in hdf5 parser which can output dataframes.
Excellent!
I do think it sounds like a good idea to incorporate some options to use the build-in pandas tooling to load the hdf5 outputs into dataframes. In fact, we could probably even add that functionality into py_mesa_reader itself as an additional set of new options. My only point of caution is that we wouldn't want to go overboard and force users into using pandas dataframes that they may not be very comfortable with, since it's good to keep the barrier to entry for this as low as we can.
@wmwolf what do you think about incorporating some options to read some MESA hdf5 output directly into pandas dataframes via py_mesa_reader?
I think it's a great idea. I think there is already some discussion over at the py_mesa_reader repo about using pandas even for plain text files because it can speed up data ingestion relative to numpy's genfromtxt. I'm usually hesitant to add additional dependencies, but I think the gains are well worth it. Since pandas data series and 1d numpy arrays are so similar, the end user could probably get away without even noticing.
I'll need to spend a little time experimenting with using pandas dataframes and series as the primary data types rather than numpy record arrays and 1d arrays to confirm that there aren't any nasty surprises, but we should plan on this feature in py_mesa_reader going forward.
It should be easy to get back to record arrays with pandas.DataFrame.to_records anyway.
I haven't looked at the HDF5 interfaces closely but one thing about the history is how HDF5 handles incrementally writing to files. I'm pretty sure we currently write plain-text histories line by line. I think we want to avoid rewriting the HDF5 history at each time step and I'm pretty sure we don't store the history, so wouldn't be able to anyway.