ddc icon indicating copy to clipboard operation
ddc copied to clipboard

Performance issue with OpenMP deepcopies and/or ChunkSpan[]

Open blegouix opened this issue 2 years ago • 4 comments

There seems to be a performance issue with deepcopies and/or ChunkSpan[] with OpenMP, or the way they interact each other.

The following branch is incredibly faster (like, 1000x) than Gysela/main when compiled with Kokkos_ENABLE_OPENMP=ON :

https://gitlab.maisondelasimulation.fr/gysela-developpers/voicexx/-/compare/main...debug_deepcopy_bracket

In fact, Gysela/main is extremelly slow with OpenMP, whereas all CPU threads are 100% used.

Note : for this demo I use the branch https://github.com/Maison-de-la-Simulation/ddc/pull/181 of DDC to get easy use of ChunkSpan(). The problematic lines are:

  • Bsl_advection_x.hpp : ddc::deepcopy(contiguous_slice, allfdistribu[ic][isp][iv]);
  • Bsl_advection_vx.hpp : ddc::deepcopy(contiguous_slice, allfdistribu[ic][isp][ix]);
  • Charge_calculator.hpp : ddc::deepcopy(f_vxvy_slice, allfdistribu[isp][ix][iy]);
  • sll/spline_builder_2d.hpp : ddc::deepcopy(vals1, vals[i]);

blegouix avatar Oct 08 '23 17:10 blegouix

We suspect it to impact only LayoutStride chunks

blegouix avatar Jan 18 '24 16:01 blegouix

In order to determine if it is a DDC issue I suggest we compare with the Kokkos equivalent code. If we notice the same behavior then we should close the issue.

tpadioleau avatar Mar 26 '24 11:03 tpadioleau

We do not anymore use those kind of combinations of deep_copies and [] anywhere in our codes, and thus performance is back, maybe I can close the issue ? I think users just have to avoid it too.

blegouix avatar Apr 02 '24 15:04 blegouix

I don't see a good reason to avoid ddc::deepcopy/Kokkos::deep_copy. We just have shown that in some particular case it was suboptimal. I still think we need to understand why, what was the layout of arrays, their rank and the sizes.

tpadioleau avatar Apr 03 '24 07:04 tpadioleau

Closing, feel free to reopen with a reproducer. My feeling is there is no strong evidence of a performance issue on the DDC side.

tpadioleau avatar Dec 24 '24 09:12 tpadioleau