Pavan Balaji comments

Results 9 comments of


                                            Pavan Balaji

coll: enhance allgatherv performance on large systems

Can you create a separate PR for new algorithm selections? That's orthogonal to this PR.

MPI_COMM_SET_INFO Is Not Collective (But It Should Be)

@hzhou Your interpretation of the standard is correct. But I think @wesbland's point is that, the implementation should be correct even if some processes give a different value than others,...

MPI_COMM_SET_INFO Is Not Collective (But It Should Be)

I don't know how `allreduce(sum)` will help. We should probably add an internal op called `is_equal`.

MPI_COMM_SET_INFO Is Not Collective (But It Should Be)

> > I don't know how `allreduce(sum)` will help. We should probably add an internal op called `is_equal`. > > ```c > if (sum == comm.size) > is_equal = true;...

test/mpi: Failures in DTPools tests on 32-bit CentOS

Is this failing without GPU support?

test/mpi: Failures in DTPools tests on 32-bit CentOS

@hzhou Yaksa is 32-bit clean. Are you seeing some error on 32-bit systems?

test/mpi: Failures in DTPools tests on 32-bit CentOS

I don't think this is a yaksa bug. The user of yaksa (MPICH) is required to make sure the data buffer that's passed in is correctly aligned for the datatype...

test/mpi: Failures in DTPools tests on 32-bit CentOS

That is the user's (MPICH) responsibility. For example, if the user passes a buffer to Yaksa as a collection of integers, then that buffer must be aligned for integers. The...

Make fast-path hooks inline

@minsii The DMA access latencies for most GPUs are in microseconds. In comparison, a function pointer dereference is ~25 cycles. So perhaps an expected gain comparison is useful before doing...