Jeremy Furtek
Results
2
comments of
Jeremy Furtek
1.) As currently written, gemm_bench will fail for Kepler GPUs for CUDA 8 and later. cublasGemmEx() is only supported on GPUs with SM 5.0 or greater (i.e. Maxwell and newer)....
XLA has some code for a scheduler that intends to improve overlapping communication and compute. https://github.com/openxla/xla/search?p=1&q=latencyhidingscheduler&type=commits I can't speak to its status - perhaps the developers at Google have some...