rapids_singlecell icon indicating copy to clipboard operation
rapids_singlecell copied to clipboard

[BUG] dask_cudf fails due to dask_expr separation from dask

Open imessien opened this issue 3 months ago • 0 comments

Describe the bug

When using rapids_singlecell functions that depend on cugraph (e.g., rsc.tl.leiden), the following import chain fails:

rapids_singlecell → cugraph → dask_cudf → expects dask.dataframe.dask_expr

The issue is that dask_expr is now a separate package (not part of dask), but dask_cudf still expects it to be available as dask.dataframe.dask_expr, causing a ModuleNotFoundError.

Steps/Code to reproduce bug Traceback (most recent call last): File "/cis/home/iessien1/Documents/pain/test.py", line 303, in rsc.tl.leiden(rna_combined, resolution=0.5, key_added="leiden") File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/rapids_singlecell/tools/_clustering.py", line 147, in leiden from cugraph import leiden as culeiden File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/cugraph/init.py", line 25, in from cugraph.structure import ( File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/cugraph/structure/init.py", line 14, in from cugraph.structure.graph_classes import ( File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/cugraph/structure/graph_classes.py", line 15, in from .graph_implementation import ( File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/cugraph/structure/graph_implementation/init.py", line 14, in from .simpleGraph import simpleGraphImpl File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/cugraph/structure/graph_implementation/simpleGraph.py", line 14, in from cugraph.structure import graph_primtypes_wrapper File "cugraph/structure/graph_primtypes_wrapper.pyx", line 25, in init cugraph.structure.graph_primtypes_wrapper File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/dask_cudf/init.py", line 9, in from . import backends, io # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/dask_cudf/io/init.py", line 3, in from dask_cudf.core import _deprecated_api File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/dask_cudf/core.py", line 13, in from dask_cudf._expr.collection import ( File "/cis/home/iessien1/Documents/pain/.venv/lib/python3.12/site-packages/dask_cudf/_expr/init.py", line 8, in import dask.dataframe.dask_expr._shuffle as _shuffle_module ModuleNotFoundError: No module named 'dask.dataframe.dask_expr'

Expected behavior

Any rapids_singlecell function that imports cugraph will fail, including:

  • rsc.tl.leiden() - Leiden clustering
  • Potentially other graph-based operations that depend on cugraph

rsc.tl.leiden() and other functions using cugraph should work without requiring workarounds or monkeypatching.

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Linux 6.2.0-26-generic
    • GPU Model/Driver: NVIDIA RTX A5500 (4 GPUs, 24GB each)
  • CUDA: [11.8]
  • Method of Rapids install:pip

Core RAPIDS Packages

  • rapids_singlecell: 0.13.4
  • cugraph-cu12: 25.10.1
  • cuml-cu12: 25.10.0
  • cupy-cuda12x: 13.6.0
  • libcugraph-cu12: 25.10.1
  • libcuml-cu12: 25.10.0

Dask Ecosystem

  • dask: 2024.11.2
  • dask-cudf-cu12: 25.10.0
  • dask-expr: 1.1.19
  • dask-cuda: 25.10.0
  • dask-image: 2025.11.0

Single-cell Analysis

  • scanpy: 1.11.5
  • anndata: 0.12.6
  • muon: 0.1.7
  • mofax: 0.3.7
  • phate: 2.0.0

Data Science Stack

  • numpy: 2.3.4
  • pandas: 2.3.3
  • scikit-learn: 1.7.2
  • matplotlib: 3.10.7
  • seaborn: 0.13.2

Installation Method

  • Package Manager: uv (Python package manager)
  • Environment: Virtual environment (.venv)
  • CUDA: Libraries configured via LD_LIBRARY_PATH and PATH in activation script

pip show rapids-singlecell cugraph dask dask-cudf dask-expr cupy cuml

Additional context

A monkeypatch can be used as a temporary workaround, but this is not ideal:

import dask.dataframe
import dask_expr
import types
import sys

# Monkeypatch dask.dataframe.dask_expr
if not hasattr(dask.dataframe, "dask_expr"):
    dask_expr_submodule = types.ModuleType("dask_expr")
    # ... (complex monkeypatch code)
    dask.dataframe.dask_expr = dask_expr_submodule
    sys.modules["dask.dataframe.dask_expr"] = dask_expr_submodule

# Also need to create dask._expr stub for dask_cuda compatibility
if "dask._expr" not in sys.modules:
    dask_expr_stub = types.ModuleType("_expr")
    # ... (stub classes)
    sys.modules["dask._expr"] = dask_expr_stub
    dask._expr = dask_expr_stub

import rapids_singlecell as rsc

This issue was encountered while processing single-cell RNA-seq and ATAC-seq datasets with ~18,000 cells, where GPU-accelerated clustering is beneficial for performance.

imessien avatar Nov 15 '25 09:11 imessien