hector Enable CSV output stream from R package (or a helper to fetch all outputs)

I can imagine plenty of scenarios where one might want to do a bunch of Hector simulations (e.g. for a sensitivity or uncertainty analysis) without knowing the full list of desired output variables in advance. For these situations, the old Hector capability of dumping all of its output variables to a CSV file would be really useful. However, as far as I can tell, there isn't a way of turning on the CSV output stream from the R interface or INI file.

Off the top of my head, I see two possibilities:

(1) An R helper function (or set of helper functions) containing all of the variables Hector knows how to provide (e.g. all_variables), and/or the ability to retrieve all of the variables via a separate function fetchvars_all or a special argument like all to fetchvars. This should be pretty straightforward to implement, but will only work for variables that the R interface knows about, which is currently not all of them (which may be its own separate issue? I feel like the R interface should be able to retrieve any Hector variable...).

(2) The ability to enable/specify/configure the CSV output stream via sendMessage (which in turn would allow this to be modified by the INI file or the R interface). This solution is more difficult to implement, but is also more versatile (e.g. works with R or the INI file, and to the extent that pyhector uses sendMessage calls, could be easily integrated there as well).

Nov 13 '19 14:11 ashiklom

Regarding option 1, fetchvars already takes a vector of variables to return, so all we need is a function that returns a list of everything that's available. That's pretty easy to add and equally easy to use, so I don't think we need another function. If we want to support the fetch everything use case, this is the best way to do it.

Personally, I really don't like option 2, as it presses Hector's internal messaging mechanism into service to send messages to something (the CSV output visitor, presumably) that isn't a model component. In my opinion, it's better to keep the messaging system simple and clean, particularly since it already provides a way to retrieve output from the model.

Now, in the C++ API we do expose both the visitor classes and Core::addVisitor, so we could in principle create R bindings for those functions and classes; however, I don't think the visitor mechanism makes a whole lot of sense in the context of the API. The visitors were designed in the context of a stand-alone wrapper that only ever ran the model from start to finish and then exited. In the context of the API, which allows the model to be started, stopped, changed, restarted, and so on, it makes a lot more sense to let the API user decide how to organize output, not to mention having the option to store the results in memory, instead of always writing them to disk. It wouldn't cause any harm as such to expose the visitors in the R interface, but it would complicate the R interface and create an additional development, documentation, and maintenance burden, all for something that basically duplicates an existing capability (and does so with less flexibility to boot). I don't think the juice is worth the squeeze here.

... [this] will only work for variables that the R interface knows about, which is currently not all of them.

Exposing new variables is also easy to do. I've been adding new ones only as someone finds a need for them partly out of laziness and partly to avoid organization problems in the documentation (I'm already getting complaints that some of the emissions documentation is hard to find). If someone wanted to allocate some time to make everything in the model exported in the R bindings, that would be a worthy endeavor. If we do that, we should add to the documentation an index of capabilities with links to their individual documentation files, so as to make it easier for users to track down the more obscure capabilities.

Nov 13 '19 17:11 rplzzz

Thanks! I agree that the R route seems like the more straightforward path.

Seems like there are two distinct tasks here:

[ ] An R function that returns a list of all Hector variables that are available
- Possibly broken apart by type---e.g. exogenous inputs (e.g. emissions, pre-industrial concentration), parameters (e.g. BETA, ECS), and output variables (e.g. concentrations, forcings).
[ ] Expose the rest of Hector's inputs and outputs, and add their corresponding R bindings.

Both of these seem like good "first issue" tasks. @mnichol3 If you're interested, do you want to take a stab at this? I think this would be an effective, hands-on way to get a sense of what Hector can do and how it works. These tasks can also be tackled incrementally with a series of small pull requests that are unlikely to break things.

Nov 13 '19 18:11 ashiklom

@ashiklom that's a good idea for @mnichol3.

Nov 13 '19 18:11 cahartin

Sounds good to me

edit: I would assign myself to this issue but it appears that I don't have privileges

Nov 13 '19 18:11 mnichol3

Thanks @ashiklom!

Nov 13 '19 19:11 mnichol3

Note to self: this could be done using the exact same mechanism I used for the csv_tracking_visitor output (which returns a data frame to R via get_tracking_data()).

Aug 24 '21 15:08 bpbond

@dawnlwoodard This is the issue we discussed last week.

Sep 22 '21 00:09 bpbond