nvidia-cuda-tutorial icon indicating copy to clipboard operation
nvidia-cuda-tutorial copied to clipboard

Add section on Grid Groups and Grid sync

Open gmarkall opened this issue 4 years ago • 0 comments

Grid groups and grid sync were added in Numba 0.53.1. A short section on using these to implement a global barrier would be good, perhaps based around the example kernel from the documentation:

@cuda.jit(void(int32[:,::1]))
def sequential_rows(M):
    col = cuda.grid(1)
    g = cuda.cg.this_grid()

    rows = M.shape[0]
    cols = M.shape[1]

    for row in range(1, rows):
        opposite = cols - col - 1
        # Each row's elements are one greater than the previous row
        M[row, col] = M[row - 1, opposite] + 1
        # Wait until all threads have written their column element,
        # and that the write is visible to all other threads
        g.sync()

gmarkall avatar Apr 21 '21 07:04 gmarkall