Duplicate code in Riemann solver
There is significant duplicate code between the HLL and HLLC Riemann solvers, as well as within HLLC itself (with minor duplication involving HLLD). Refactoring these components could substantially reduce the module's size.
@ChrisZYJ would you mind having a look at this! glad to have your thoughts and maybe a minor improvement PR or just a sketch of what we could do
@sbryngelson I remember you told me to add this issue when I worked on the MHD HLLD PR. This is a very hard problem. I’ve read through Riemann solvers again (Mohammad’s PRs are great!) and thought about it a lot. Below are just my personal opinions from how I learnt to code (they could be completely wrong or irrelevant).
1. Code repetition
1. (a) Within HLLC
Within HLLC, there are 4 large branches (4-Eqn/5-Eqn/6-Eqn/5-Eqn no bubble). We can combine the shared structures and use subroutines for the different core logic so they are easier to navigate:
subroutine s_hllc_riemann_solver(...)
call s_populate_riemann_states_variables_buffers(...)
call s_initialize_riemann_solver(flux_src_vf, norm_dir)
if (model_eqns == 3) then
call s_hllc_6eqns(...)
elseif (model_eqns == 4) then
call s_hllc_4eqns(...)
elseif (model_eqns == 2) then
if (bubbles_euler) then
call s_hllc_5eqns_bubbles(...)
else
call s_hllc_5eqns_no_bubbles(...)
end if
end if
if (viscous) call s_compute_viscous_source_flux(...)
if (surface_tension) call s_compute_capillary_source_flux(...)
call s_finalize_riemann_solver(...)
end subroutine
1. (b) Across HLL / HLLC / HLLD / LF
Refactoring here is arguably harder. Logically they should share:
- build left/right states
- apply chemistry / elasticity / bubbles / MHD
- compute average state + sound speeds
- compute wave speeds
- compute fluxes + geometric fluxes
But in reality, many physics are only implemented in one solver or another (elasticity only in HLL, bubbles only in HLLC, and HLLD only does MHD). The geometric fluxes treatment for HLL vs HLLC is also different. For this practical reason, I would recommend only factoring out the obvious across the Riemann solvers.
2. Long inline subroutines
Right now almost everything is inlined for performance, so HLL/HLLC are thousands of lines. Concerns are also interleaved in a non-obvious order. That makes them hard to read, so refactoring repetitive code is even harder.
Personally, I prefer that the top-level solvers encode logic, with similar-level abstractions grouped together. Something like this:
subroutine s_hll_riemann_solver(...)
call s_populate_riemann_buffers(...)
call s_build_mixture_states(...)
call s_build_riemann_side_states(...)
call s_compute_wave_speeds(...)
call s_build_fluxes(...)
call s_build_cylindrical_geometric_flux(...)
end subroutine
This also applies to the subroutines within. For instance, s_build_fluxes() can also be made clearer and contain just high-level subroutine calls. We probably need fypp inlining so the performance is unaffected.
3. Many scalars
We currently pass around a huge number of scalars:
- Left/right states: rho_L/R, pres_L/R, E_L/R, H_L/R, etc.
- Mixture/EOS: gamma_L/R, pi_inf_L/R, qv_L/R
- Chemistry: Ys_L/R, Xs_L/R, Cp_iL/R, Gamma_iL/R, etc.
- Elasticity: tau_e_L/R, G_L/R
- Bubbles: R0_, P0_, etc.
- Wave speeds: s_L, s_R, s_S, s_M, s_P, xi_
I think it’s cleaner if we group them into small derived types and pass those around instead of separate locals. For example:
type riemann_side_state
real(wp) :: rho
real(wp) :: pres
real(wp) :: E
real(wp) :: H
real(wp) :: gamma
real(wp) :: pi_inf
real(wp) :: qv
real(wp) :: vel_rms
real(wp), dimension(num_vels) :: vel
real(wp), dimension(num_fluids) :: alpha
real(wp), dimension(num_fluids) :: alpha_rho
end type riemann_side_state
type(riemann_side_state) :: L, R
call build_mixture_state(L, qL_prim_rs${XYZ}$_vf, j, k, l)
call build_mixture_state(R, qR_prim_rs${XYZ}$_vf, j+1, k, l)
If we keep the types simple (no allocatables), I think the CPU/GPU performance would be the same, but we'll need to profile.
4. Minor stuff
Locals
We can use locals for frequently reused expressions for clarity. Just one example:
real(wp) :: un_L, un_R
un_L = vel_L(dir_idx(1))
un_R = vel_R(dir_idx(1))
s_L = min(un_L - c_L, un_R - c_R)
s_R = max(un_R + c_R, un_L + c_L)
xi
Currently we use xi to remove branches when choosing wave region (SL*, SL, SR, SR*) we’re in. I think they are much more confusing and error-prone than using the usual formulas to find (UL*, UL, UR, SR*) and constructing the fluxes explicitly. However, for GPU there might be an inevitable penalty, so I’m inclined to keep this the way it is.
thanks @ChrisZYJ -- i agree with this approach, very nice!