dash icon indicating copy to clipboard operation
dash copied to clipboard

[WIP] Boost dash::sort with even more parallelism

Open rkowalewski opened this issue 7 years ago • 2 comments

We need more parallelism to exploit the power of many-core nodes. The underlying algorithm itself will be rewritten to eliminate barriers. This includes the following major changes:

  • Algorithmic improvements:

    • [x] reduce communication overhead (alltoall communications)
    • [x] overlap communication and the final merge step as efficiently as possible (@pascalj)
    • [ ] Instead of perfect partitioning we provide another variant where we do not require an in-place sort. Users can provide a larger output buffer with the same pattern, but each unit has a certain threshold of additional local storage. Example: dash::sort(first, last, out, sort_hash).
  • Minor changes

    • integrate Intel Parallel STL (based on Intel TBB) into DASH to exploit shared memory parallelism more efficiently.
    • Allow DART_UNDEFINED_UNIT_ID as a valid unit for DART communication routines #617
  • Further Impacts

    • Threadsupport is now enabled by default in CI. Other tests fail, needs some investigation.

Note: This list will grow.

rkowalewski avatar Nov 22 '18 15:11 rkowalewski

Codecov Report

Merging #611 into development will decrease coverage by 0.38%. The diff coverage is 81.99%.

@@               Coverage Diff               @@
##           development     #611      +/-   ##
===============================================
- Coverage        84.95%   84.57%   -0.39%     
===============================================
  Files              335      344       +9     
  Lines            24821    25028     +207     
  Branches         11497    11285     -212     
===============================================
+ Hits             21087    21167      +80     
- Misses            3733     3851     +118     
- Partials             1       10       +9
Impacted Files Coverage Δ
dash/include/dash/iterator/internal/GlobPtrBase.h 91.2% <ø> (-0.1%) :arrow_down:
dash/include/dash/internal/Logging.h 100% <ø> (ø) :arrow_up:
dash/include/cpp17/monotonic_buffer.h 0% <0%> (ø)
dash/src/cpp17/monotonic_buffer.cc 0% <0%> (ø)
dash/include/dash/algorithm/sort/Histogram.h 100% <100%> (ø)
dash/include/dash/algorithm/sort/Sampling.h 100% <100%> (ø)
dash/test/algorithm/SortTest.cc 98.26% <100%> (+0.45%) :arrow_up:
dash/include/dash/algorithm/sort/Communication.h 100% <100%> (ø)
dash/include/dash/algorithm/sort/Types.h 100% <100%> (ø)
dash/include/dash/algorithm/sort/Sort-inl.h 100% <100%> (ø)
... and 19 more

codecov[bot] avatar Nov 22 '18 15:11 codecov[bot]

We definitely need a way to configure the number of threads from the outside at runtime, e.g., through an environment variable. That inevitably leads to the wider question of how we want to handle runtime configuration (config files? env variables? both? who is in charge of the parsing? right now DART and DASH both do their own thing but that is sub-optimal...)

Currently we support three env variables to configure multi-threading which is built-in in our locality stuff. See the documentation of UnitLocality.num_domain_threads():

  • DASH_DISABLE_THREADS: If set, disables multi-threading at unit scope and this method returns 1
  • DASH_MAX_SMT: If set, virtual SMT CPUs (hyperthreads) instead of physical cores are used to determine availble threads.
  • DASH_MAX_UNIT_THREADS: Specifies the maximum number of threads available to a single unit.

I suppose this is also built into DART somehow since the locality interface is implemented down there.

rkowalewski avatar Dec 18 '18 08:12 rkowalewski