serac icon indicating copy to clipboard operation
serac copied to clipboard

[WIP] performance refactor: batched kernel implementations

Open samuelpmishLLNL opened this issue 3 years ago • 1 comments

The main point of this PR is to change the implementation of the core finite element kernels from (pseudocode, using heat equation as an example):

for (auto element : mesh) {

  // load the nodal temperatures for this element
  auto element_temperatures = temperatures[element];

  // integration loop
  for (auto quadrature_point : integration_rule) {

    // load the isoparametric coordinates, physical coordinates, and jacobian 
    // associated with this quadrature point
    auto [xi, x, J] = integration_point_data[quadrature_point];

    // compute the values of temperature, temperature gradient at this quadrature point
    auto [u, du_dx] = trial_space::interpolate(element_temperatures, xi, J);

    // evaluate the user's constitutive model at this quadrature point
    auto [source, flux] = qf(x, u, du_dx);

    // integrate the source and flux terms against shape functions and their derivatives (respectively)
    // to get the element residual contribution from this quadrature point
    element_residual += test_space::integrate(source, flux, xi, J);

  }

}

to

for (auto element : mesh) {

  // load the nodal temperatures for this element
  auto element_temperatures = temperatures[element];

  // load all the physical coordinates and jacobians
  // for the quadrature points associated with this element
  auto [x, J] = integration_point_data[element];

  // compute the values of temperature and temperature gradient at each quadrature point
  auto [u, du_dx] = trial_space::batch_interpolate(element_temperatures, xi, J);

  // evaluate the user's constitutive model at this quadrature point
  auto [sources, fluxes] = batch_apply(qf, x, u, du_dx);

  // integrate the source and flux terms against shape functions and their derivatives (respectively)
  // to get the complete element residual from all quadrature points
  element_residual += test_space::batch_integrate(sources, fluxes, xi, J);

}

that is, to implement functions that perform batched interpolation and integration (acting on all of the quadrature points at once, rather than one at a time).

The main reason for this change is that computing these quantities in a batched way admits a more efficient algorithm that exploits the tensor product structure of the shape functions and quadrature rules to reduce the total amount of computation performed. To be clear: this more efficient algorithm is not an innovation by serac -- our initial kernel implementation was naive, so its performance was

To make this change possible, this PR requires that specializations of the finite_element class template implement two functions:

  • finite_element< geometry, function_space >::interpolate(element_values, jacobians, quadrature_rule) that takes in the values associated with an element, and returns an array of the interpolated values and derivatives for each quadrature point. ("E->Q")

  • finite_element< geometry, function_space >::integrate(sources_and_fluxes, jacobians, quadrature_rule, element_residual) that takes in an array of the source and flux terms at each quadrature point, and numerically integrates them against the shape functions and shape function derivatives (respectively), adding the computed residuals to the last argument

samuelpmishLLNL avatar May 25 '22 17:05 samuelpmishLLNL

Codecov Report

Merging #706 (b6db428) into develop (d5dcd67) will increase coverage by 0.08%. The diff coverage is 98.52%.

@@             Coverage Diff             @@
##           develop     #706      +/-   ##
===========================================
+ Coverage    95.04%   95.12%   +0.08%     
===========================================
  Files          142      146       +4     
  Lines         9440     9933     +493     
===========================================
+ Hits          8972     9449     +477     
- Misses         468      484      +16     
Impacted Files Coverage Δ
src/serac/infrastructure/profiling.cpp 50.00% <ø> (ø)
...rac/numerics/functional/detail/metaprogramming.hpp 100.00% <ø> (ø)
...c/serac/numerics/functional/integral_utilities.hpp 0.00% <ø> (-100.00%) :arrow_down:
src/serac/numerics/functional/polynomials.hpp 100.00% <ø> (ø)
...rics/functional/tests/functional_boundary_test.cpp 80.00% <ø> (-4.54%) :arrow_down:
...erics/functional/tests/functional_multiphysics.cpp 100.00% <ø> (ø)
...numerics/functional/tests/functional_nonlinear.cpp 100.00% <ø> (ø)
...ac/numerics/functional/detail/quadrilateral_L2.inl 71.92% <71.92%> (-28.08%) :arrow_down:
...rc/serac/numerics/functional/boundary_integral.hpp 100.00% <100.00%> (ø)
.../numerics/functional/boundary_integral_kernels.hpp 100.00% <100.00%> (ø)
... and 41 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Oct 01 '22 05:10 codecov-commenter

https://github.com/LLNL/serac/blob/e9bcc81719b322201e85f451b87981c6de112687/src/serac/numerics/functional/boundary_integral.hpp#L31-L36

It seems that the doxygen block doesn't align with the class template parameters. I realize this is not code you modified in this PR, but I just caught it now.

btalamini avatar Nov 03 '22 20:11 btalamini

https://github.com/LLNL/serac/blob/e9bcc81719b322201e85f451b87981c6de112687/src/serac/numerics/functional/boundary_integral.hpp#L31-L36

It seems that the doxygen block doesn't align with the class template parameters. I realize this is not code you modified in this PR, but I just caught it now.

Interesting, usually doxygen warns if the descriptions don't agree with the argument names. I wonder if it doesn't check consistency of template parameters?

samuelpmishLLNL avatar Nov 03 '22 20:11 samuelpmishLLNL