dask icon indicating copy to clipboard operation
dask copied to clipboard

Error using db.from_delayed with sparse arrays

Open joshua-gould opened this issue 1 year ago • 2 comments

File "python3.11/site-packages/dask/bag/core.py", line 1881, in reify
    if len(seq) and isinstance(seq[0], Iterator):
      ^^^^^^^^^^^^^^^^^
  File "python3.11/site-packages/scipy/sparse/_base.py", line 425, in __len__
    raise TypeError("sparse array length is ambiguous; use getnnz()"

Minimal Complete Verifiable Example:

import dask.bag as db
import numpy as np
from dask import delayed
from scipy.sparse import csr_array


def add(x, y):
    return x + y


@delayed
def create_sparse_array_delayed():
    return csr_array(np.random.random((10, 10)))


@delayed
def create_array_delayed():
    return np.random.random((10, 10))


db.from_sequence(
    [csr_array(np.random.random((10, 10))), csr_array(np.random.random((10, 10)))]).fold(
    add).compute()  # works with sparse arrays when created from sequence
db.from_delayed([create_array_delayed(), create_array_delayed()]).fold(add).compute()  # works with numpy arrays
db.from_delayed([create_sparse_array_delayed(), create_sparse_array_delayed()]).fold(add).compute()  # fails

Environment:

  • Dask version: 2024.12.0
  • Python version: 3.11
  • Operating System: Mac
  • Install: pip

joshua-gould avatar Jan 07 '25 21:01 joshua-gould

Thanks for your report. Any advice on how we can make this work without adding scipy as a dependency for bags?

phofl avatar Jan 08 '25 21:01 phofl

FYI I've created a fix for this bug here: https://github.com/dask/dask/pull/12103

cc: @phofl @joshua-gould

batcity avatar Oct 17 '25 20:10 batcity