Nested version of ReduceMinLoc/ReduceMaxLoc would be nice
It would be nice if there was a nested/forallN version of RAJA::ReduceMinLoc/RAJA::ReduceMaxLoc. I have a calculation where I need to find the maximum element and its location in a 3-D array, E(t,y,x). Hence I want to find E_max(t_max,y_max,x_max) >= E(t,y,x) forall t, y, and x. In the case of ties (multiple equivalent maxima), any one of the locations will do.
I think we could implement a version where the reducer was templated on both the type of the value and the type of the index - then you could use an "int3" or similar for the index type.
Would that work for your use case?
Our index types are all defined using RAJA_INDEX_VALUE(TimeInd, "Time Index"), RAJA_INDEX_VALUE(YInd, "Y Spatial Index), etc. Underneath, they are all RAJA::Index_type. We could use something like a int3 or std::tuple.
It would be nice to know on what time scale a feature like this might appear (days, weeks, months, or years). If it's days or weeks, I would probably wait. Otherwise, I'll do a work around.
Making this work robustly across all backends would take some work (std::tuple doesn't work on the device), so the timescale would be weeks/months.
You can work around using the Layout classes. By calling the toIndices method, you can convert the linear index back to the component pieces.
At our meeting yesterday, @trws mentioned the complexity of implementing a custom tuple class. However, I think @willkill07 mentioned that he was looking into an implementation that someone had built that would work on a device. For this particular case and potentially others would it make sense to try something simpler like a Index_type template (templated on the actual index type and int ndims)?
FYI, here is the CUDA-supported tuple from agency-library/agency
https://github.com/agency-library/agency/blob/master/agency/detail/tuple.hpp
It's not currently stand-alone, but it is 3-clause BSD (same license as RAJA)
There is a tuple in the feature/trws/foralln-reimagined branch as camp::tuple. With the relaxed constexpr flag on nvcc it works well on the device, maybe worth giving that a try @tepperly? That's also the branch where I'm developing what will eventually replace forallN, so you may want to take a look anyway.
I believe we can close this issue, since we have tests for this sort of thing in our test suite