Add specification for the `__binsparse__` protocol
This pull request adds the specification for the binsparse protocol (closes #840).
@willow-ahrens @BenBrock from the binsparse team.
@mtsokol @ivirshup for scipy.sparse
@leofang for cupyx.sparse
@pearu for torch.sparse
@jakevdp for JAX/TensorFlow
Introduction
The binsparse protocol is meant to be a specification for on-disk storage of ND sparse arrays. It requires just two things from a back-end implementing it:
a. A way to store 1D and 2D (dense) arrays (we have this via DLPack)
b. A way to parse and interpret JSON (we have this via the json module)
Psuedocode implementation
Here's a psuedocode example using two libraries, xp1 and xp2, both supporting sparse arrays:
# In library code:
xp2_sparray = xp2.from_binsparse(xp1_sparray, ...)
# Or
xp2_sparray = xp2.asarray(xp1_sparray, ...)
# This psuedocode impl is common between `xp1` and `xp2`
def from_binsparse(x: object, /, *, device: device | None = None, copy: bool | None = None) -> array:
binsparse_descr = getattr(x, "__binsparse_descriptor__", None)
binsparse_impl = getattr(x, "__binsparse__", None)
if binsparse_impl is None or binsparse_descr is None:
raise TypeError(...)
binsparse_descriptor = binsparse_descr()
# Will raise an error if the format/descriptor is unsupported.
sparse_type = _type_from_binsparse_descriptor(binsparse_descriptor)
constituent_arrays = binsparse_impl()
my_constituent_arrays = {
k: from_dlpack(arr, device=device, copy=copy) for k, arr in constituent_arrays.items()
}
return sparse_type.from_strided_arrays(my_constituent_arrays, shape=...)
Compare this to the following example converting SciPy COO arrays to PyData/Sparse:
import sparse
import scipy.sparse as sps
import numpy as np
sparse_array = sparse.COO(np.stack(sps_array.coords), sps_array.data, shape=sps_array.shape)
Parallel implementation in sparse: https://github.com/pydata/sparse/pull/764
Parallel implementation in SciPy: https://github.com/scipy/scipy/pull/22553