Ordering of shared indices in `IndexMap`
It could be helpful to group the (1) exclusively owned, (2) owned and shared and (3) ghost indices, and the case (2) and (3) order by rank. At the moment we just have group (3) at the end of the array.
|||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| |||||||||||
(1) (2) (3)
Currently (1) and (2) are mixed, but it would be good to group so one could have a range for (2) with contiguous memory. That would allow for example :
// Create vector with USM allocated memory
la::Vector<double, DeviceAllocator> x(indexmap, bs);
std::array range = x.shared_range(); // (2) not available
std::array range = x.ghos_range(); // (3) Currently available via the indexmap
int size = range[1] -range[0];
double* ptr = x.data();
double* shared_buffer = AllocateSharedMemory(size);
double* ghosts_buffer = AllocateSharedMemory(ghost_size);
// Device to device memcopy (CUDA, SYCL, ...)
memcopy(ptr+range[0], shared_buffer , size);
memcopy(ptr+range_ghosts[0], ghosts_buffer, ghost_size);
// If a GPU-aware mpi implementation is available no copy from device to host.
// Otherwise only copy the shared and ghost buffers.
indexmap.scatter_fwd(shared_buffer, ghosts_buffer);
This might affect data-locality adversely. I think, for now, it would be worth making it an optional choice, so we can test out the performance implications both ways.
@IgorBaratta do you think we should worry about this?
I think this is possibly to restrictive. Creator can always control order for data locality when taking slices, but IndexMap doesn't need to know about this.