Jiqun Tu issues

Results 8 issues of


                                            Jiqun Tu

Fused DWF + NVSHMEM

~~This PR makes the DWF fused kernels run with NVSHMEM. Running on one node on Selene with 1x2x2x2 and Ls = 12, getting (performance numbers are in GFLOPS)~~ No there...

Creating an unified interface for creating/accepting preconditioner (for PCG)

As the title suggested, we should have an unified interface for creating/accepting preconditioning solvers. Currently with #1061 the create part of the interface is located in `invert_preconditioner.h`.

feature

clean-up

More coverage/clean up for split grid

Improvements to split grid in the future: - Add support for split grid + multi-shift. It should be straight forward. - Add support for split grid when the number of...

feature

clean-up

Add an `instantiate` item for copy gauge field and copy gauge field offset, etc

Add an `instantiate` item for `copy_gauge_field` and `copy_gauge_field_offset` for the gauge orders, etc. One trick thing is that with the lists in `instantiate.h` it becomes hard to know which file...

clean-up

Make sure trove use the 3-d thread index when we update local version of trove

Currently `trove` uses 1-d thread index, i.e. it uses `threadIdx.x` instead of `(threadIdx.z * blockDim.y + threadIdx.y) * blockDim.x + threadIdx.x`. We should make sure 3-d thread index is used...

clean-up

Allow `copy_<to/from>_buffer` to copy to/from device buffer

The `copy__buffer` methods of the various field types assumes the buffer is on the host - this forbids one from doing split and join fields from device buffers when GPU...

optimization

Cache the collected gauge and clover fields when doing split grid

One could cache the collected gauge and clover fields and reuse the previously generated fields when doing split grid. The starting point should probably be - https://github.com/lattice/quda/blob/ed21580eabd7dd8bfebee40a65ab813af1453f95/lib/interface_quda.cpp#L3180 - https://github.com/lattice/quda/blob/ed21580eabd7dd8bfebee40a65ab813af1453f95/lib/interface_quda.cpp#L3202 In...

optimization

MMA-izing the prolongator and restrictor kernels

MMA-izing the prolongator and restrictor kernels.