feature: self-contained notebooks, API for example models
Heavily edited after more consideration
Is your feature request related to a problem? Please describe.
This repo contains example model data in examples/data/. A download link was recently added to example notebooks rendered on ReadTheDocs, but many are not immediately runnable after download as they rely on example data. To run the notebooks first requires cloning the repo, or downloading files from the GitHub web UI, etc. This adds friction for first-time use and (from a maintainer perspective) complicates the docs build.
Describe the solution you'd like
One option is a module (and maybe CLI) providing access to example models. Small models could be included in the package, larger ones downloaded/cached on demand. Projects like PyVista and scikit-image do this. Or model input files could be generated on demand. Models in the following repos could be included:
-
modflowpy/flopy, inexamples/data/ -
MODFLOW-USGS/modflow6-examples -
MODFLOW-USGS/modflow6-testmodels -
MODFLOW-USGS/modflow6-largetestmodels
PyVista usage looks like:
from pyvista import examples
teapot = examples.download_teapot()
skimage looks like:
from skimage import data
coins = data.coins()
The latter seems mildly preferable for brevity and because the data may already be cached (if downloaded) or generated (no downloads).
In flopy's case, maybe e.g.
from flopy import examples
sim = examples.freyberg_mf6()
Alternatively, it could return a Path to the model/simulation directory instead of the Modflow/MFSimulation/etc itself. The model/simulation seems preferable as the path is retrievable from it anyway. To avoid polluting the cache with output files and to support the common case of loading then switching to a new workspace before rewriting/running, a workspace or sim_path option may be convenient, perhaps defaulting to a temporary directory.
Notebooks and tests would then be able to use the example model interface. Removing implicit filesystem expectations leaves notebooks dependent only on a python/flopy env and modflow binaries.
PyVista uses Pooch to do the fetching/caching, some of whose source skimage appears to vendor in their own implementation. If we generate model input files instead of downloading them, this would not be necessary.
A few more considerations. Example models are currently defined directly as input files. Is this the right way for programmatic access to models? If so, which models to bundle and which to download?
The largest subdirs of examples/data, increasing by size, are
> du -sh * | sort -h
...
1.0M ssm_load_test
1.7M swr_test
2.5M mp6_examples
2.7M preserve_unitnums
2.9M mf2005_test
3.6M freyberg_usg
5.0M options
5.0M uzf_examples
5.6M swtv4_test
7.7M mp6
8.2M mt3d_example_sft_lkt_uzt
17M mfusg_test
23M mnw2_examples
23M pcgn_test
33M mf6
51M mt3d_test
54M zonbud_examples
62M secp
86M freyberg_multilayer_transient
An alternative may be to define in flopy and generate/cache on first request. I'm not sure how straightforward it is to convert existing input files to flopy code.
It seems like an examples module could offer the same API either way. Maybe it is worth experimenting to see if it is generally faster to pull big models over the wire or write them fresh (maybe the recent pandas speedup helps here).
For files distributed with flopy, the examples module could internally use importlib.resources.files.
An example models module would also simplify flopy and mf6 autotests, by removing the need for custom fixtures to fetch/prepare models for testing. There has been some effort to standardize the approach for this but it's still patchy.
Maybe devtools could provide programmatic access to examples, flopy could add a hard devtools dependency, and pass the same API through for convenience. This seems reasonable because
- devtools is dependency free so it would be a light-weight addition
-
mf6.utils.generate_classes(perhaps wrongly) already depends on devtools — this seemed justifiable since class generation was up to now considered a developer task - it would allow deduplicating http client code in
utils.get_modflow(maybe get-modflow implementation could move to devtools too)