Jeremy Furtek comments

Repositories
Issues
Comments

Results 2 comments of


                                            Jeremy Furtek

Problems with NVIDIA Benchmarks

1.) As currently written, gemm_bench will fail for Kepler GPUs for CUDA 8 and later. cublasGemmEx() is only supported on GPUs with SM 5.0 or greater (i.e. Maxwell and newer)....

Support overlapping NCCL collective communication with compute on GPU

XLA has some code for a scheduler that intends to improve overlapping communication and compute. https://github.com/openxla/xla/search?p=1&q=latencyhidingscheduler&type=commits I can't speak to its status - perhaps the developers at Google have some...