[BUG] dask_cudf fails due to dask_expr separation from dask
Describe the bug
When using rapids_singlecell functions that depend on cugraph (e.g., rsc.tl.leiden), the following import chain fails:
rapids_singlecell → cugraph → dask_cudf → expects dask.dataframe.dask_expr
The issue is that dask_expr is now a separate package (not part of dask), but dask_cudf still expects it to be available as dask.dataframe.dask_expr, causing a ModuleNotFoundError.
Steps/Code to reproduce bug
Traceback (most recent call last):
File "/cis/home/iessien1/Documents/pain/test.py", line 303, in
Expected behavior
Any rapids_singlecell function that imports cugraph will fail, including:
-
rsc.tl.leiden()- Leiden clustering - Potentially other graph-based operations that depend on
cugraph
rsc.tl.leiden() and other functions using cugraph should work without requiring workarounds or monkeypatching.
Environment details (please complete the following information):
- Environment location: Bare-metal
- Linux Distro/Architecture: Linux 6.2.0-26-generic
-
- GPU Model/Driver: NVIDIA RTX A5500 (4 GPUs, 24GB each)
- CUDA: [11.8]
- Method of Rapids install:pip
Core RAPIDS Packages
- rapids_singlecell: 0.13.4
- cugraph-cu12: 25.10.1
- cuml-cu12: 25.10.0
- cupy-cuda12x: 13.6.0
- libcugraph-cu12: 25.10.1
- libcuml-cu12: 25.10.0
Dask Ecosystem
- dask: 2024.11.2
- dask-cudf-cu12: 25.10.0
- dask-expr: 1.1.19
- dask-cuda: 25.10.0
- dask-image: 2025.11.0
Single-cell Analysis
- scanpy: 1.11.5
- anndata: 0.12.6
- muon: 0.1.7
- mofax: 0.3.7
- phate: 2.0.0
Data Science Stack
- numpy: 2.3.4
- pandas: 2.3.3
- scikit-learn: 1.7.2
- matplotlib: 3.10.7
- seaborn: 0.13.2
Installation Method
-
Package Manager:
uv(Python package manager) -
Environment: Virtual environment (
.venv) -
CUDA: Libraries configured via
LD_LIBRARY_PATHandPATHin activation script
pip show rapids-singlecell cugraph dask dask-cudf dask-expr cupy cuml
Additional context
A monkeypatch can be used as a temporary workaround, but this is not ideal:
import dask.dataframe
import dask_expr
import types
import sys
# Monkeypatch dask.dataframe.dask_expr
if not hasattr(dask.dataframe, "dask_expr"):
dask_expr_submodule = types.ModuleType("dask_expr")
# ... (complex monkeypatch code)
dask.dataframe.dask_expr = dask_expr_submodule
sys.modules["dask.dataframe.dask_expr"] = dask_expr_submodule
# Also need to create dask._expr stub for dask_cuda compatibility
if "dask._expr" not in sys.modules:
dask_expr_stub = types.ModuleType("_expr")
# ... (stub classes)
sys.modules["dask._expr"] = dask_expr_stub
dask._expr = dask_expr_stub
import rapids_singlecell as rsc
This issue was encountered while processing single-cell RNA-seq and ATAC-seq datasets with ~18,000 cells, where GPU-accelerated clustering is beneficial for performance.