Introduce GlobalReduction object
This functionality is required for global reduction built-ins in #2021 and #489, as we currently only have GlobalSum and LFRic child class.
There are two attempts to adding this functionality, PR #2045 and PR #2111, both linked to issue #2021, however it would be cleaner to have a separate issue not linked to built-ins. In PR #2045, the code and documentation changes were done together, so we agreed to have two separate PRs, one for code changes and one for the documentation changes. The original branch https://github.com/stfc/PSyclone/tree/2021_global_reduction from #2045 is kept as a template for code and documentation changes.
I will close PR #2111 as it is very out of date and proceed with new PRs.
Update: There are discussions on how/when to generate symbol for reproducible reductions in PR #2376, specifically comments https://github.com/stfc/PSyclone/pull/2376#discussion_r1377735119 and https://github.com/stfc/PSyclone/pull/2376#discussion_r1380066110. The implementation there does not seem optimal and will be addressed in this issue.
Language-level PSyIR has support for various intrinsics including MAXVAL and MINVAL. At the DSL level, as @TeranIvy says, we have psyclone.lfric.LFRicGlobalSum which subclasses psyclone.psyGen.GlobalSum. GlobalSum is not sub-classed by anything else (i.e. GOcean doesn't have support for global sums).
One possible design is to introduce a new psyclone.domain.common.psylayer.GlobalReduction class. GlobalSum would then be moved to psyclone.domain.common.psylayer.GlobalSum and inherit from the new class. Most of the important work in the current LFRicGlobalSum is done in the lower_to_language method which generates language-level PSyIR for the reduction. Presumably this would be relatively simple to parameterise such that it would generate appropriate code for the different types of reduction (min, max as well as sum).
Looking at:
https://github.com/stfc/PSyclone/blob/dc66e55c00740c39580c76b5b353ab9f0825ae4c/src/psyclone/lfric.py#L3945-L3964
it is probably as simple as having two class properties - one holding the name of the reduction ("min", "max", "sum") and one (e.g. "get_sum") the name of the method that must be called on the LFRic field object.
Excluding the tests, LFRicGlobalSum is only used in the constructor of lfric.LFRicInvoke:
https://github.com/stfc/PSyclone/blob/dc66e55c00740c39580c76b5b353ab9f0825ae4c/src/psyclone/domain/lfric/lfric_invoke.py#L182-L189
Here, we'd need to add the appropriate LFRic reduction node, depending on the access of the scalar argument to the kernel.
Actually, a simpler hierarchy might be:
psy_layer.GlobalReduction -> domain.lfric.GlobalReduction
| |
---- reduction: GlobalReduction.Reduction ---- method_name_map: Dict[Reduction, str]
|
---- Argument: DataNode(?)
where we have a new enum, GlobalReduction.Reduction to parameterise the different types of reduction.
This will be less code than having separate classes for each type of reduction that we want to support (min, max, sum).
What do @TeranIvy @mo-alistairp @hiker @sergisiso and @LonelyCat124 think?
I like the direction (I am not sure I understand all the details, like what is the mehtod_name_map, but maybe this is for the PR to discuss)
The current implementations are in field_mod.t90 in lfric core.
get_max is:
function get_max(self) result (answer)
use mpi_mod, only: mpi_type
implicit none
class(field_{{kind}}_proxy_type), intent(in) :: self
type(mpi_type) :: mpi
{{type}}({{kind}}) :: l_max
{{type}}({{kind}}) :: answer
integer(i_def) :: i
! Generate local max
l_max = self%data(1)
do i = 2, self%vspace%get_last_dof_owned()
if( self%data(i) > l_max ) l_max = self%data(i)
end do
mpi = self%get_mpi()
call mpi%global_max( l_max, answer )
end function get_max
and get_min is:
function get_min(self) result (answer)
use mpi_mod, only: mpi_type
implicit none
class(field_{{kind}}_proxy_type), intent(in) :: self
type(mpi_type) :: mpi
{{type}}({{kind}}) :: l_min
{{type}}({{kind}}) :: answer
integer(i_def) :: i
! Generate local min
l_min = self%data(1)
do i = 2, self%vspace%get_last_dof_owned()
if( self%data(i) < l_min ) l_min = self%data(i)
end do
mpi = self%get_mpi()
call mpi%global_min( l_min, answer )
end function get_min
Unfortunately, things are more complicated than I first thought. OMPParallelDirective.lower_to_language_level calls (for every Kernel that is a reduction) psyGen.zero_reduction_variables and psyGen.Kern.reduction_sum_loop to generate the necessary PSyIR (including the extra code if reproducible reductions are switched on). The question is, can we refactor this to make use of the recent support for reductions that was added in #3108?
As a first step, I could move those methods out of psyGen and into the relevant GlobalReduction class.
I'm trying a new hierarchy in 2381_no_domain_global_reduction.
Currently, the constructor for LFRicInvoke automatically adds LFRicGlobalSum nodes after every built-in that requires one. Therefore, there are two issues:
- handling the reduction in the local loop over dofs that the built-in corresponds to (if being run in parallel);
- performing the global reduction on the result of the local reduction.
Step 1 is further complicated by our support for performing reproducible reductions, perhaps that could be folded into #3108?
Currently, the LFRicGlobalSum constructor is passed the 'operand' that it will perform the reduction on. However, at this stage we are dealing with higher-level nodes and don't yet have a Symbol for the result of the local reduction, therefore it is the kernel argument that it is given.
The question is, how can the result of lowering the built-in be picked up when lowering the global reduction? Can the latter query the loop associated with the former for any reduction variables? Even then, what about reproducible reductions?
#3108 added functionality to OMPLoopTrans and ParallelLoopTrans so that, if passed a list of possible reduction operations, they would analyse the loop body and automatically add appropriate clauses to the Directive being introduced. However, this requires that the loop body be language-level PSyIR and, at the point we apply the transformation to the LFRic PSyIR, it isn't - the loop just contains a BuiltinKern.
Therefore, my new approach is, after the loop body has been lowered, to use reference_accesses and then the new ReductionInferenceTool to create new OMPReductionClause nodes and add them to the directive.
I'm currently doing this in OMPParallelDirective.lower_to_language_level() but the trouble is, this can include >1 loop, each with an OMPLoopDirective. Previously, we always put the reduction clause on the OMPLoopDirective and I think we'd want to keep that.
To do that, I could add a lower_to_language_level to OMPLoopDirective or it might be better to add it to DataSharingAttributeMixin?
I've realised that our current support for reductions is a hodge-podge of old and new. The old builds a string containing the reduction and appends it to the directive. The new creates an OMPReductionClause and adds it as a child of the Directive node. Obviously we should replace the former with the latter so I've decided to do that first. (Branch 514_use_omp_reduction_clause)