MPI GPU interface refactoring
Description
Changes proposed in this pull request:
- Add virtual get_mpi_offload_support function to base communicator - defaults to false in nearly all cases
- Add logic to get_mpi_offload_support function in mpi/communicator.h to check mpi libs for correct symbol and determine if level zero is supported
- Add conditional in detail/communicator.cpp that uses result of get_mpi_offload_support to determine whether to convert data to host (previous default) or leave as is (yields performance improvements if GPU offload support in MPI)
- Modify sendrecv_replace args to include optional additional buffer to accommodate MPICH workaround to call sendrecv with 2 GPU buffers
/intelci: run
Looks as a great opportunity to get more speedup across all algorithms
Thanks! Yeah its pretty ugly right now, working towards functional first then will clean things up. But good points.
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
Nightly combined with infra branch: http://intel-ci.intel.com/eedcda75-51a4-f11e-8bab-a4bf010d0e2e
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
Final steps are to check MPICH scalability with alternative approach, confirm infra changes, and determine whether its necessary to add any additional conditions to use offloading
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run
/intelci: run