array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Add specification for the `__binsparse__` protocol

Open hameerabbasi opened this issue 11 months ago • 0 comments

This pull request adds the specification for the binsparse protocol (closes #840).

@willow-ahrens @BenBrock from the binsparse team. @mtsokol @ivirshup for scipy.sparse @leofang for cupyx.sparse @pearu for torch.sparse @jakevdp for JAX/TensorFlow

Introduction

The binsparse protocol is meant to be a specification for on-disk storage of ND sparse arrays. It requires just two things from a back-end implementing it:

a. A way to store 1D and 2D (dense) arrays (we have this via DLPack) b. A way to parse and interpret JSON (we have this via the json module)

Psuedocode implementation

Here's a psuedocode example using two libraries, xp1 and xp2, both supporting sparse arrays:

# In library code:
xp2_sparray = xp2.from_binsparse(xp1_sparray, ...)

# Or
xp2_sparray = xp2.asarray(xp1_sparray, ...)

# This psuedocode impl is common between `xp1` and `xp2`
def from_binsparse(x: object, /, *, device: device | None = None, copy: bool | None = None) -> array:
    binsparse_descr = getattr(x, "__binsparse_descriptor__", None)
    binsparse_impl = getattr(x, "__binsparse__", None)
    if binsparse_impl is None or binsparse_descr is None:
        raise TypeError(...)
    
    binsparse_descriptor = binsparse_descr()
    # Will raise an error if the format/descriptor is unsupported.
    sparse_type = _type_from_binsparse_descriptor(binsparse_descriptor)
    constituent_arrays = binsparse_impl()
    my_constituent_arrays = {
        k: from_dlpack(arr, device=device, copy=copy) for k, arr in constituent_arrays.items()
    }
    return sparse_type.from_strided_arrays(my_constituent_arrays, shape=...)

Compare this to the following example converting SciPy COO arrays to PyData/Sparse:

import sparse
import scipy.sparse as sps
import numpy as np

sparse_array = sparse.COO(np.stack(sps_array.coords), sps_array.data, shape=sps_array.shape)

Parallel implementation in sparse: https://github.com/pydata/sparse/pull/764 Parallel implementation in SciPy: https://github.com/scipy/scipy/pull/22553

hameerabbasi avatar Mar 10 '25 10:03 hameerabbasi