Thomas H. Gibson
Thomas H. Gibson
# Description Some remaining issues to clear up for the batched GMRES solver: 1. Mask-out converged columns; 2. Optimizing global memory access; 4. Remove unnecessary movement of data - Can...
In order to use PETSc on GPUs, we'll also need to wrap the necessary data types for CUDA and OpenCL. We'll need to provide the compiled binaries as well; https://www.mcs.anl.gov/petsc/features/gpus.html
cc: @inducer
This PR modifies the HIP kernels and includes an optional compile-time flag to modify how many elements are processed per thread lane. A summary of modifications: - ~All kernels have...
This PR modifies the current driver `main.cpp` and adds MPI support for launching the benchmark across multiple devices. The main takeaways here: - Each MPI rank is assigned a specific...