Clean Up The Driver (🧹🚗🧹)
Purpose
Thanks to recent work by @charleskawczynski, the driver is now unified, in that it is localized to driver.jl plus a small number of included files, and every simulation can be run with julia --project driver.jl <command-line args>. Although we can start adding features like EDMF and coupling now, there are several issues with the current setup that have been becoming increasingly problematic, and we should probably sort them out before adding major new features. These issues are:
- Experimenting with the simulation through the REPL (or a Jupyter notebook) requires awkwardly circumventing the command-line argument interface. All the parameters of the driver are set through
parsed_args, which is a dictionary that gets returned byArgParse.parse_args()whendriver.jlis run. Ifinclude("driver.jl")is called in the REPL, there are no command-line arguments available, so this dictionary does not contain all the information necessary to run the driver. The current workaround is to modifydriver.jl, replacing every use ofparsed_args[<driver parameter that must be set>]with the desired driver parameter value. This is rather tedious, and it would be better if the driver did not have to be modified unless new driver-related functionality is getting added. So, for example, in order to run a simulation that we use for CI from the REPL, and then rerun the simulation with an additional tendency term and compare the results, one should not need to modifydriver.jl, and one certainly should not need to think about the command-line argument interface. This issue was mentioned here. - The current chain of calls to
includemakes it difficult to trace where functions are defined, and, more broadly, makes it difficult to understand the code at a high level. In addition, since our code is a series of files that need to getincluded at runtime in a specific order, it cannot be moved into/srcand turned into a self-contained package. - After being extracted from
parsed_args, the parameters of the driver are stored in a smörgåsbord of different ways: global constants (e.g.,FTandsponge), global variables (e.g.,forcingandturbconv), functions that return constants (e.g.,energy_name()andupwinding_mode()), fields inSchurComplementW(e.g.,jacobian_flagsandtest_implicit_solver) and fields in the cache (e.g.,paramsandidealized_insolation). Many parameters are stored in duplicate places (e.g.,idealized_h2ois both a global variable and a field in the cache). This inconsistency in parameter storage results in the same issues as overusinginclude---it makes the code is harder to trace through and understand, and the use of globals and functions that are determined at runtime prevents the code from being moved into/src. This is mentioned in Issues #341 and #392.
This proposal describes a series of PRs which will fix these issues and hopefully simplify our upcoming integration efforts.
Note that this proposal is not The Great Modeling Interface Redesign; although it changes some of our top-level data structures, it does not introduce any major new functionality like automatic construction of the Jacobian and application of boundary conditions. That will come later.
Cost/benefits/risks
The downside of this proposal is that the driver will continue to get modified for another 1--2 weeks. However, the resulting interface will be better suited for interactive debugging, as the code will be easier to trace through and call from the REPL. Also, it will be easier to move our code into /src and turn it into a package that can be called from other codebases.
Producers
@dennisYatunin will implement this proposal.
Components
- A
DriverConfigstruct will be introduced. This will be amutable structthat contains all parameters of the driver. Every collection of related simulations will be represented by a function that returns aDriverConfig. For example, the cubed sphere baroclinic wave and Held-Suarez simulations will be represented bydycore_testing_config(kwargs...), while the single column radiative equilibrium (and, eventually, radiative-convective equilibrium) simulations will be represented bycolumn_equilibrium_config(kwargs...), and the vertical plane inertial gravity wave simulations will be represented byinertial_gravity_wave_config(kwargs...). So, to run a particular version of the Held-Suarez simulation from the REPL, one will calldycore_testing_config(kwargs...)for some appropriately chosenkwargs, and then callinclude("driver.jl"). To run the same simulation from the command line, one will executejulia --project driver.jl --dycore_testing_config <options...>. The main feature of this setup is that thekwargsandoptionsare exactly identical---ifkwargsis(; key1 = val1, key2 = val2, key3 = val3), thenoptionswill be--key1=val1 --key2=val2 --key3=val3. This means that any simulation used for CI can be easily run from the REPL by copying the corresponding line from the Buildkite pipeline and slightly reformatting it (i.e., adding some parentheses and replacing dashes with commas). Also, once aDriverConfigis available in the REPL, its mutable fields (e.g.,t_endordt) can be modified, and theninclude("driver.jl")can be called to run the modified simulation. - An
AbstractTendencytype will be introduced, with subtypes that includeHeldSuarezForcing,ZeroMomentMicrophysics, andRRTMGPRadiation. EachAbstractTendencywill store the parameters that modify how that tendency is evaluated (e.g.,idealized_h2oforRRTMGPRadiation). Every tendency will define three functions:get_cache(Y, model, tendency),get_callback(model, tendency), andapply_tendency!(Yₜ, Y, cache, t, model, tendency). - An
AbstractModeltype will be introduced, along with aStaggeredNonhydrostaticModel <: AbstractModel. This will store all the parameters that modify how the overall tendency is evaluated (e.g., the floating-point type, the upwinding scheme, and the set of non-implicit tendencies). It will define three functions for the implicit solver:get_implicit_cache(Y, model),implicit_tendency!(dY, Y, cache, t, model), andWfact!(W, Y, cache, dtγ, t, model). It may also defineget_explicit_cache(Y, model),get_callbacks(model), andexplicit_tendency!(Yₜ, Y, cache, t, model), but these will just be loops over the tuple ofAbstractTendencys, so they do not necessarily need to be separate functions. - An
AbstractInitialConditiontype will be introduced, with subtypes that includeDycoreTestingInitialConditionandColumnEquilibriumInitialCondition. This will store all the parameters that modify the initial condition (e.g., the choice of energy variable, whether moisture is included, and whether the flow is balanced). It will define the function(::AbstractInitialCondition)(space, model).
Each of these new components will reside in its own module, so that we can avoid cluttering the global namespace. Here is a sketch of how some of the data structures will look:
Base.@kwdef mutable struct DriverConfig
# Simulation parameters
model::AbstractModel = StaggeredNonhydrostaticModel()
initial_condition::AbstractInitialCondition
postprocessing::Function # use custom type?
# Domain parameters
horizontal_mesh::Meshes.AbstractMesh
quad::Spaces.Quadratures.QuadratureStyle
z_max::Real
z_elem::Int
z_stretch::Meshes.StretchingRule = Meshes.Uniform()
t_end::Real
# Solver parameters
ode_algorithm::OrdinaryDiffEqAlgorithm
dt::Real
dt_save_to_sol::Real = 0
dt_save_to_disk::Real = 0
max_newton_iters::Int = 10
additional_callbacks::NTuple{N, DECallback} where {N} = ()
additional_solver_kwargs::NamedTuple = (;)
show_progress_bar::Bool = isinteractive()
end
struct RRTMGPRadiation{
M <: AbstractRadiationMode,
I <: AbstractInterpolation,
B <: AbstractBottomExtrapolation,
FT,
} <: AbstractTendency
radiation_mode::M
interpolation::I
bottom_extrapolation::B
idealized_insolation::Bool
idealized_h2o::Bool
update_frequency::FT
end
Base.@kwdef struct StaggeredNonhydrostaticModel{
T <: Type{<:AbstractFloat},
P <: CLIMAParameters.AbstractEarthParameterSet,
U <: Union{Nothing, Operators.AdvectionOperator},
E <: NTuple{N, AbstractTendency} where {N},
J <: NamedTuple,
} <: AbstractModel
FT::T = Float32
params::P = DefaultEarthParameterSet()
explicit_tendencies::E = (DefaultExplicitTendency(),)
á¶ upwind_product::U = nothing
jacobian_flags::J = (;)
test_implicit_solver::Bool = false
end
struct DycoreTestingInitialCondition <: AbstractInitialCondition
energy_var::Symbol
moisture_mode::Symbol
is_balanced_flow::Bool
end
function dycore_testing_post_processing(sol, output_dir)
# make animations and plots
end
Inputs
N/A
Results and deliverables
Since this is an interface redesign, the main performance metric will be whether people find that it is easier to run simulations.
Task breakdown
- [ ] Implement every explicit tendency as an
AbstractTendency, and move all tendency-related parameters into these objects. - [x] Implement every initial condition as an
AbstractInitialCondition, and move all initial-condition-related parameters into these objects. - [x] Unify the post-processing ~into
dycore_testing_post_processingandcolumn_equilibrium_post_processing~. - [x] Implement the ~
StaggeredNonhydrostaticModel~AtmosModel, and move all model-related parameters into it. - [x] Implement the ~
DriverConfig~AtmosConfig, and move all remaining parameters into it. ~Implementdycore_testing_configandcolumn_equilibrium_config, and modify command-line argument parsing appropriately.~
Reviewers
@charleskawczynski @szy21 @jiahe23 @bischtob
Looks good to me!
I'm in favor.
This sounds great. I'd like to propose that we make these changes, incrementally since there's never really a great time for massive changes.
Regarding point 1, we could easily do something like this:
include("examples/cli_options.jl")
(s, parsed_args) = parse_commandline()
# edit parsed_args in the REPL
include("examples/driver.jl")
where, in the driver, we have
include("cli_options.jl")
if !(@isdefined parsed_args)
(s, parsed_args) = parse_commandline()
end
I'm completely on board with the other two points-- the include structure needs to be adjusted, and the model composition helper functions should be made more consistent.
The reason we probably want something more advanced is that different configurations will require different command line arguments. That is, --dycore_testing_config and --column_equilibrium_config are commands (rather than flags), which change how arguments are parsed afterward. So, yes, DriverConfig is essentially doing the job currently fulfilled by parsed_args, but generalizing things in this way will allow us to be more flexible in terms of which parameters can be modified for different configurations.
My current vision for what will be at the top of driver.jl is
if isinteractive()
@isdefined(truncate_stack_traces) || truncate_stack_traces = true
@isdefined(driver_config) || error("`driver_config` must be defined before `include(\"driver.jl\")`")
else
parsed_args = ArgParse.parse_args(ARGS, arg_parse_settings; as_symbols = true)
truncate_stack_traces = parsed_args[:truncate_stack_traces]
config_symbol = parsed_args[:_COMMAND_]
if config_symbol == :dycore_testing_config
driver_config = dycore_testing_config(; parsed_args[:dycore_testing_config]...)
elseif config_symbol == :column_equilibrium_config
driver_config = column_equilibrium_config(; parsed_args[:column_equilibrium_config]...)
else
error("unknown driver configuration $config_symbol")
end
end
Another reason we want something more advanced than parsed_args to store a driver configuration is that parsed_args can only contain simple objects like numbers or strings that can be entered through the command line, while a DriverConfig can contain anything. If I want to add a new tendency with a DriverConfig, I just define that tendency's struct and add an instance of it to driver_config.model.explicit_tendencies. I don't see any analogous way to do this using parsed_args. Of course, I could modify how parsed_args is generated and interpreted by modifying our source code, but the goal of this proposal is to allow us to develop in the REPL without needing to modify source code for every minor change.