Sparse tensor and Sparse CudaTensor
Sparse tensor support is important in general machine learning especially for to store matrices of one-hot encoded vectors.
For CUDA backend, CuBLAS has a Sparse API. For CPU, further investigation is needed to find a suitable Sparse BLAS backend. See this link for potential libraries.
Progress on the research:
The main challenge is finding an up-to-date Sparse BLAS library.
Survey of the field:
- http://www.netlib.org/utk/people/JackDongarra/la-sw.html
Updated:
- MKL
- Scipy but not parallel.
Seems state of the art:
- ViennaCL via their OpenMP backend is much! faster than MKL, see SpMM benchmark. Work on CSR matrices. C++.
- librsb and Julia wrapper. Custom Matrix Format. C/Fortran.
MPI / Distributed:
Full sparse tensor library:
Seems staled:
- PSBLAS (Parallel SparseBlas)
- Exascience
- Apple Accelerate (Accessible on iPhones)
Paper:
"BlockSparse" optimized GPU kernels by OpenAI:
- Announcement
- Paper
- https://github.com/openai/blocksparse
The TACO tensor compiler generates efficient dense, sparse, and block sparse kernels for various formats. It is worth checking out.