ClimaAtmos.jl Clean Up The Driver (🧹🚗🧹)

Purpose

Thanks to recent work by @charleskawczynski, the driver is now unified, in that it is localized to driver.jl plus a small number of included files, and every simulation can be run with julia --project driver.jl <command-line args>. Although we can start adding features like EDMF and coupling now, there are several issues with the current setup that have been becoming increasingly problematic, and we should probably sort them out before adding major new features. These issues are:

Experimenting with the simulation through the REPL (or a Jupyter notebook) requires awkwardly circumventing the command-line argument interface. All the parameters of the driver are set through parsed_args, which is a dictionary that gets returned by ArgParse.parse_args() when driver.jl is run. If include("driver.jl") is called in the REPL, there are no command-line arguments available, so this dictionary does not contain all the information necessary to run the driver. The current workaround is to modify driver.jl, replacing every use of parsed_args[<driver parameter that must be set>] with the desired driver parameter value. This is rather tedious, and it would be better if the driver did not have to be modified unless new driver-related functionality is getting added. So, for example, in order to run a simulation that we use for CI from the REPL, and then rerun the simulation with an additional tendency term and compare the results, one should not need to modify driver.jl, and one certainly should not need to think about the command-line argument interface. This issue was mentioned here.
The current chain of calls to include makes it difficult to trace where functions are defined, and, more broadly, makes it difficult to understand the code at a high level. In addition, since our code is a series of files that need to get included at runtime in a specific order, it cannot be moved into /src and turned into a self-contained package.
After being extracted from parsed_args, the parameters of the driver are stored in a smörgåsbord of different ways: global constants (e.g., FT and sponge), global variables (e.g., forcing and turbconv), functions that return constants (e.g., energy_name() and upwinding_mode()), fields in SchurComplementW (e.g., jacobian_flags and test_implicit_solver) and fields in the cache (e.g., params and idealized_insolation). Many parameters are stored in duplicate places (e.g., idealized_h2o is both a global variable and a field in the cache). This inconsistency in parameter storage results in the same issues as overusing include---it makes the code is harder to trace through and understand, and the use of globals and functions that are determined at runtime prevents the code from being moved into /src. This is mentioned in Issues #341 and #392.

This proposal describes a series of PRs which will fix these issues and hopefully simplify our upcoming integration efforts.

Note that this proposal is not The Great Modeling Interface Redesign; although it changes some of our top-level data structures, it does not introduce any major new functionality like automatic construction of the Jacobian and application of boundary conditions. That will come later.

Cost/benefits/risks

The downside of this proposal is that the driver will continue to get modified for another 1--2 weeks. However, the resulting interface will be better suited for interactive debugging, as the code will be easier to trace through and call from the REPL. Also, it will be easier to move our code into /src and turn it into a package that can be called from other codebases.

Producers

@dennisYatunin will implement this proposal.

Components

A DriverConfig struct will be introduced. This will be a mutable struct that contains all parameters of the driver. Every collection of related simulations will be represented by a function that returns a DriverConfig. For example, the cubed sphere baroclinic wave and Held-Suarez simulations will be represented by dycore_testing_config(kwargs...), while the single column radiative equilibrium (and, eventually, radiative-convective equilibrium) simulations will be represented by column_equilibrium_config(kwargs...), and the vertical plane inertial gravity wave simulations will be represented by inertial_gravity_wave_config(kwargs...). So, to run a particular version of the Held-Suarez simulation from the REPL, one will call dycore_testing_config(kwargs...) for some appropriately chosen kwargs, and then call include("driver.jl"). To run the same simulation from the command line, one will execute julia --project driver.jl --dycore_testing_config <options...>. The main feature of this setup is that the kwargs and options are exactly identical---if kwargs is (; key1 = val1, key2 = val2, key3 = val3), then options will be --key1=val1 --key2=val2 --key3=val3. This means that any simulation used for CI can be easily run from the REPL by copying the corresponding line from the Buildkite pipeline and slightly reformatting it (i.e., adding some parentheses and replacing dashes with commas). Also, once a DriverConfig is available in the REPL, its mutable fields (e.g., t_end or dt) can be modified, and then include("driver.jl") can be called to run the modified simulation.
An AbstractTendency type will be introduced, with subtypes that include HeldSuarezForcing, ZeroMomentMicrophysics, and RRTMGPRadiation. Each AbstractTendency will store the parameters that modify how that tendency is evaluated (e.g., idealized_h2o for RRTMGPRadiation). Every tendency will define three functions: get_cache(Y, model, tendency), get_callback(model, tendency), and apply_tendency!(Yₜ, Y, cache, t, model, tendency).
An AbstractModel type will be introduced, along with a StaggeredNonhydrostaticModel <: AbstractModel. This will store all the parameters that modify how the overall tendency is evaluated (e.g., the floating-point type, the upwinding scheme, and the set of non-implicit tendencies). It will define three functions for the implicit solver: get_implicit_cache(Y, model), implicit_tendency!(dY, Y, cache, t, model), and Wfact!(W, Y, cache, dtγ, t, model). It may also define get_explicit_cache(Y, model), get_callbacks(model), and explicit_tendency!(Yₜ, Y, cache, t, model), but these will just be loops over the tuple of AbstractTendencys, so they do not necessarily need to be separate functions.
An AbstractInitialCondition type will be introduced, with subtypes that include DycoreTestingInitialCondition and ColumnEquilibriumInitialCondition. This will store all the parameters that modify the initial condition (e.g., the choice of energy variable, whether moisture is included, and whether the flow is balanced). It will define the function (::AbstractInitialCondition)(space, model).

Each of these new components will reside in its own module, so that we can avoid cluttering the global namespace. Here is a sketch of how some of the data structures will look:

Base.@kwdef mutable struct DriverConfig
    # Simulation parameters
    model::AbstractModel = StaggeredNonhydrostaticModel()
    initial_condition::AbstractInitialCondition
    postprocessing::Function # use custom type?

    # Domain parameters
    horizontal_mesh::Meshes.AbstractMesh
    quad::Spaces.Quadratures.QuadratureStyle
    z_max::Real
    z_elem::Int
    z_stretch::Meshes.StretchingRule = Meshes.Uniform()
    t_end::Real

    # Solver parameters
    ode_algorithm::OrdinaryDiffEqAlgorithm
    dt::Real
    dt_save_to_sol::Real = 0
    dt_save_to_disk::Real = 0
    max_newton_iters::Int = 10
    additional_callbacks::NTuple{N, DECallback} where {N} = ()
    additional_solver_kwargs::NamedTuple = (;)
    show_progress_bar::Bool = isinteractive()
end

struct RRTMGPRadiation{
    M <: AbstractRadiationMode,
    I <: AbstractInterpolation,
    B <: AbstractBottomExtrapolation,
    FT,
} <: AbstractTendency
    radiation_mode::M
    interpolation::I
    bottom_extrapolation::B
    idealized_insolation::Bool
    idealized_h2o::Bool
    update_frequency::FT
end

Base.@kwdef struct StaggeredNonhydrostaticModel{
    T <: Type{<:AbstractFloat},
    P <: CLIMAParameters.AbstractEarthParameterSet,
    U <: Union{Nothing, Operators.AdvectionOperator},
    E <: NTuple{N, AbstractTendency} where {N},
    J <: NamedTuple,
} <: AbstractModel
    FT::T = Float32
    params::P = DefaultEarthParameterSet()
    explicit_tendencies::E = (DefaultExplicitTendency(),)
    ᶠupwind_product::U = nothing
    jacobian_flags::J = (;)
    test_implicit_solver::Bool = false
end

struct DycoreTestingInitialCondition <: AbstractInitialCondition
    energy_var::Symbol
    moisture_mode::Symbol
    is_balanced_flow::Bool
end

function dycore_testing_post_processing(sol, output_dir)
    # make animations and plots
end

Inputs

N/A

Results and deliverables

Since this is an interface redesign, the main performance metric will be whether people find that it is easier to run simulations.

Task breakdown

[ ] Implement every explicit tendency as an AbstractTendency, and move all tendency-related parameters into these objects.
[x] Implement every initial condition as an AbstractInitialCondition, and move all initial-condition-related parameters into these objects.
[x] Unify the post-processing ~into dycore_testing_post_processing and column_equilibrium_post_processing~.
[x] Implement the ~StaggeredNonhydrostaticModel~AtmosModel, and move all model-related parameters into it.
[x] Implement the ~DriverConfig~AtmosConfig, and move all remaining parameters into it. ~Implement dycore_testing_config and column_equilibrium_config, and modify command-line argument parsing appropriately.~

Reviewers

@charleskawczynski @szy21 @jiahe23 @bischtob

May 09 '22 22:05 dennisYatunin

Looks good to me!

May 09 '22 23:05 szy21

I'm in favor.

May 09 '22 23:05 bischtob

This sounds great. I'd like to propose that we make these changes, incrementally since there's never really a great time for massive changes.

Regarding point 1, we could easily do something like this:

include("examples/cli_options.jl")
(s, parsed_args) = parse_commandline()
# edit parsed_args in the REPL

include("examples/driver.jl")

where, in the driver, we have

include("cli_options.jl")

if !(@isdefined parsed_args)
    (s, parsed_args) = parse_commandline()
end

I'm completely on board with the other two points-- the include structure needs to be adjusted, and the model composition helper functions should be made more consistent.

May 09 '22 23:05 charleskawczynski

The reason we probably want something more advanced is that different configurations will require different command line arguments. That is, --dycore_testing_config and --column_equilibrium_config are commands (rather than flags), which change how arguments are parsed afterward. So, yes, DriverConfig is essentially doing the job currently fulfilled by parsed_args, but generalizing things in this way will allow us to be more flexible in terms of which parameters can be modified for different configurations.

May 09 '22 23:05 dennisYatunin

My current vision for what will be at the top of driver.jl is

if isinteractive()
    @isdefined(truncate_stack_traces) || truncate_stack_traces = true
    @isdefined(driver_config) || error("`driver_config` must be defined before `include(\"driver.jl\")`")
else
    parsed_args = ArgParse.parse_args(ARGS, arg_parse_settings; as_symbols = true)
    truncate_stack_traces = parsed_args[:truncate_stack_traces]
    config_symbol = parsed_args[:_COMMAND_]
    if config_symbol == :dycore_testing_config
        driver_config = dycore_testing_config(; parsed_args[:dycore_testing_config]...)
    elseif config_symbol == :column_equilibrium_config
        driver_config = column_equilibrium_config(; parsed_args[:column_equilibrium_config]...)
    else
        error("unknown driver configuration $config_symbol")
    end
end

May 09 '22 23:05 dennisYatunin

Another reason we want something more advanced than parsed_args to store a driver configuration is that parsed_args can only contain simple objects like numbers or strings that can be entered through the command line, while a DriverConfig can contain anything. If I want to add a new tendency with a DriverConfig, I just define that tendency's struct and add an instance of it to driver_config.model.explicit_tendencies. I don't see any analogous way to do this using parsed_args. Of course, I could modify how parsed_args is generated and interpreted by modifying our source code, but the goal of this proposal is to allow us to develop in the REPL without needing to modify source code for every minor change.

May 09 '22 23:05 dennisYatunin