If the original dataframe is constructed using Dask.Dataframe, then Create_report throws error ( saying it found delayed ). Looks like the code doesn't compute for DaskFrame
Need Dask Dataframe support for Create_REPORT - Need to materialize computes When the input dataframe is constructed from Dask.DataFrame , create_report(df) throws error "Missing Cells": float(ncells - npresent_cells), TypeError: float() argument must be a string or a number, not 'Delayed'
To Reproduce Steps to reproduce the behavior:
- Construct Dataframe using Dask dataframe ( dd.read_csv(...) dd.compute())
- call create_report(df)
- See error
Expected behavior create_report runs successfully
Screenshots
Formating Overview: 11%|███████▎ | 4/35 [00:35<04:36, 8.93s/it]Traceback (most recent call last):
File "
Desktop (please complete the following information):
- OS: ubuntu
- Browser : chrome
- Platform Jupyter Lab,
- Dataprep Version [ 0.2.2]
Additional context Add any other context about the problem here.
also. Dask Dataframe works fine with Plot.
Hi @ssenathi, our last released version only supports Pandas dataframes. create_report on the develop branch supports Dask dataframes, and will be available in the next release. You could try using the develop branch with pip install git+https://github.com/sfu-db/dataprep.git@develop