CUDA
CUDA copied to clipboard
RNMS for n_box>512
Have you tested RNMS for n_boxes>512(threadsPerBlock = sizeof(unsigned long long) * 8)? Since shared memory is accessible within the block, it looks like IOU comparison of boxes which resides in shared memory of different blocks is not possible.