DEMO-1 quarto notebook (run workflow from pecan.xml)
Description
This PR introduces the initial version of a Quarto notebook as part of the Quarto PEcAn notebooks-based workflow GSoC project. The notebook enables users to run a complete PEcAn workflow using a pre-generated pecan.xml . It replicates the web-based Demo 1 workflow starting after the configuration step, allowing users to execute model runs, perform analysis, and visualize results directly from within the notebook interface.
Motivation and Context
This PR targets PEcAn gsoc project running worflows from quarto notebook.It is decided to keep two separate notebooks for replicating demo 1 workflow running via the web.
- first notebook -> One for interactively generating the pecan.xml file based on user inputs (model, site, met selection, etc.).
- second notebook -> Another that takes a pre-generated pecan.xml as input and handles the model run, analysis, and visualization, replicating the web interface flow post-setup.
This particular PR targets the second notebook, which uses a pre-generated pecan.xml file (either configured via the PEcAn web interface or to be created by the first notebook, which will be introduced in a follow-up PR).
Directory Structure
Few things to point at this point is about the directory structure -
i planned to keep the quarto notebook in this directory pecan/base/inst/quarto-notebooks/_extensions/
then two folder demo 1 and demo 2
demo 1 folder will consist of two notebooks 1)generate_xml.qmd 2)run_pecan.qmd
demo2 folder will consist of single notebook which will do ensemble,sensivity analysis basically replicating demo 2 workflow.
_extensions/
├── demo1/
│ ├── generate_xml.qmd # Interactive XML configuration notebook
│ └── run_pecan.qmd # Run PEcAn workflow using pre-generated XML
└── demo2/
└── demo2_workflow.qmd # Notebook for uncertainty & ensemble analysis
Any feedback on this directory structure is appreciated.
Knit report: PEcAn_workflow.pdf
Review Time Estimate
- [ ] Immediately
- [x] Within one week
- [ ] When possible
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
- [x] My change requires a change to the documentation.
- [ ] My name is in the list of CITATION.cff
- [x] I agree that PEcAn Project may distribute my contribution under any or all of
- the same license as the existing code,
- and/or the BSD 3-clause license.
- [x] I have updated the CHANGELOG.md.
- [ ] I have updated the documentation accordingly.
- [x] I have read the CONTRIBUTING document.
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
@mdietze @dlebauer There are some important considerations regarding the proposal to omit the database entirely from the workflow.
Both prepare.settings and write.configs currently rely on a valid database connection. Specifically:
- prepare.settings performs validation using the database.
- write.configs not only depends on the database but also generates the runs.txt file, which is essential for start_model_runs.
If we aim to support running a workflow without any database connection, the following major changes would be required:
- Refactor core PEcAn workflow functions to eliminate database dependencies.
- Introduce an alternative mechanism for storing and retrieving run metadata.
- Implement a separate parameter management system to replace the current DB-based approach.
Given the amount of changes needed, I think making quarto based workflow completely database-independent is beyond the scope of this project. That said, reducing database dependency in specific areas could still be helpful. I'm open to any suggestions or alternative approaches you might have!
In run.write.configs, the database connection is only required if you set write=TRUE (i.e. if you're trying to record the runs to the database). So clearly the demo should set write=FALSE. The database connection is also in no way required to write runs.txt
As for prepare.settings, that function is technically optional so long as there are no errors in the settings. If you did want to try and include it and find that it's throwing errors, it should be fairly simple to refactor the parts that are trying to access the database to skip those checks. If I recall, the main thing prepare.settings was using the database for was to map various IDs to specific file paths (inputs, posteriors, model executable) all of which we can & should be setting directly in this demo.
Given that others have already established that it is possible to run the core of the workflow without the database, and done the essential bits of code refactoring already to make the db connection optional, I see no reason that the notebook version should require the database.
@mdietze Do you think its a good idea to have a separate notebook on how to run the workflows with the database?
Or maybe in the current one have separate sections for with and without the database connection. Depending on what you think is a good idea we can ask @AritraDey-Dev to setup the notebook.
Also, from what I see now in the code there are places where error handling could be a good choice rather than just assuming that files exist or required packages are downloaded.
I think it's worth discussing at what point in the learning process one should be made aware of how to bring in BETY if you want to use it, but I think that Demo 1 would be too complicated if the notebook attempted to explain a with vs without BETY dichotomy and had double options for every step that would be handled differently.
slightly tangentially, I think it's also fair to discuss what remaining things need to be running to pull in v1 of this new notebook worklfow vs what requested changes could/should be handled by a later PR.
I agree with @mdietze’s point — introducing BETY at this stage could overcomplicate Demo 1. I believe it would be more effective to introduce BETY-related functionality in a separate notebook, specifically geared toward trait and meta-analysis. This way, users who are ready to incorporate BETY can engage with it in a targeted context without adding complexity to the initial demo.
Also, from what I see now in the code there are places where error handling could be a good choice rather than just assuming that files exist or required packages are downloaded.
For the Quarto-based workflow, I feel that explicit error-handling steps are largely unnecessary, as the underlying core functions already manage them. Additionally, the existing comments in each chunk do a good job of explaining what’s happening, which further reduces the need for separate error-handling logic within the notebook itself.
Yes! Agreed that Demo 1 would overcomplicate things.
@AritraDey-Dev regarding the error handling, your reason seems reasonable to not have it.
@mdietze @infotroph Have your concerns mentioned above been addressed? If so, we can probably merge this one.