[nemo] Array-valued function needs to be inlined for OpenACC performance
For example: sbcblk.f90, cp_air() function.
Things that still need doing:
- [ ] handling the inlining of routines that call other routines (https://github.com/stfc/PSyclone/pull/2413#discussion_r1692776348);
- [ ] understand why profiling (in acc_kernels_trans) interacts with inlining and maybe drop it in favour of the
--profilecommand-line flag; - [ ] add support for inlining of routines called via a named interface;
- [ ] support for routines with named arguments;
e.g. if we have some code:
my_array(:,:) = cp_air(array1(:,:)) * array2(:,:) + array3(:,:)
where cp_air is an array-valued function that performs compute then this is hard to parallelise/ensure it remains GPU resident. (If we put ACC KERNELS inside cp_air then we can't have it around this assignment.) Ideally we need to refactor this:
my_tmp(:,:) = cp_air(array1(:,:))
my_array(:,:) = my_tmp(:,:)*array2(:,:) + array3(:,:)
That then allows us to have KERNELS inside cp_air() and, separately, around the assignment.
Better still would be to in-line cp_air but we have to walk before we can run.
If I remember correctly, some of the existing kernel transformations (that replace intrinsics) have logic similar to the above. Perhaps the splitting out of an intrinsic/function should have its own transformation which can then be shared.
Yes, I said as much to Chris when we were discussing this. Hopefully there's some reuse possible.
Obviously, a step 0 for this work is to be able to identify array-valued functions. Currently, all RoutineSymbols that represent Fortran functions are given DeferredType (#1294).
This is going quite well. I need to handle the case where there's a name clash for a ContainerSymbol as we can't simply rename it in that case.
Note to self: I need to identify when a routine accesses variables made available from some outer scoping region as this prevents inlining.
The first review and subsequent work on validate has thrown up a number of things that I'm adding TODOs for this time around:
- [ ]
validatewill be much nicer once we can query thedatatypeof anyReference(#1799) - [ ]
Rangeneeds extending to return the PSyIR for the number of elements it represents - [ ] Support needs to be added for routines with named arguments.
The initial implementation of InlineTransformation is now on master, but some remaining TODOs need to be fixed to close this issue.
This is also one of the major issues in OpenMP for GPU now, mostly in the sbcblk region. Some functions that will need to be inlined are: q_sat, sbc_dcy, gamma_moist, cd_neutral_10m, psi_h, psi_m
Although the basic inlining functionality is on master, it still requires that the routine to be inlined be in the same Container as the call site. That means we need module inlining working in native PSyIR (currently I think it only works for PSyKAl APIs).
Alternatively, now that we have the elemental attribute, we can a resolve the cp_air symbol (which could be imported) for:
my_array(:,:) = cp_air(array1(:,:)) * array2(:,:) + array3(:,:)
and then arrayrange2loop can decide how to convert it into loop-form depending if it is elemental or not.
(Ideally PSyclone should be able to do both, and the transformation script just chose how to approach it)
Currently a CodedKern has a get_kernel_schedule method. To generalise the KernelModuleInlineTrans transformation we need to support Call as well. Call has routine which gives the corresponding RoutineSymbol. In the work I'm doing now I propose to add a RoutineSymbol.get_schedule method which is very like the ContainerSymbol.container method. I could name it schedule but, since it potentially does a lot of searching, I prefer the get_ prefix. I'd argue that the container method could also do with a get_ prefix.
I agree with the get_ but maybe it could be Routine.Symbol.get_routine() instead of "schedule" to make it easier for people not familiar with the PSyIR concepts.
(PS: Now I would also rename Container to Module for the same reason, and I think I am to blame for picking the wrong name)
(PS: Now I would also rename Container to Module for the same reason, and I think I am to blame for picking the wrong name)
Well, we do aspire to be language-neutral so I think Container is fine. At least 'routine' is a generic concept :-)
Chatting with Martin and Hugo, it turns out that CROCO doesn't have modules but they would still like to be able to do inlining. This would mean we would need to insert routines into the FileContainer instead (or, we just extend InlineTrans to search for the Routine outside of the current Container?).