dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

If the original dataframe is constructed using Dask.Dataframe, then Create_report throws error ( saying it found delayed ). Looks like the code doesn't compute for DaskFrame

Open ssenathi opened this issue 5 years ago • 2 comments

Need Dask Dataframe support for Create_REPORT - Need to materialize computes When the input dataframe is constructed from Dask.DataFrame , create_report(df) throws error "Missing Cells": float(ncells - npresent_cells), TypeError: float() argument must be a string or a number, not 'Delayed'

To Reproduce Steps to reproduce the behavior:

  1. Construct Dataframe using Dask dataframe ( dd.read_csv(...) dd.compute())
  2. call create_report(df)
  3. See error

Expected behavior create_report runs successfully

Screenshots Formating Overview: 11%|███████▎ | 4/35 [00:35<04:36, 8.93s/it]Traceback (most recent call last): File "", line 1, in File "/home/ssenathi/.local/lib/python3.6/site-packages/dataprep/eda/create_report/init.py", line 51, in create_report "components": format_report(df, mode), File "/home/ssenathi/.local/lib/python3.6/site-packages/dataprep/eda/create_report/formatter.py", line 49, in format_report comps = format_basic(df, comps) File "/home/ssenathi/.local/lib/python3.6/site-packages/dataprep/eda/create_report/formatter.py", line 136, in format_basic comps["overview"] = _format_stats(stats, "overview") File "/home/ssenathi/.local/lib/python3.6/site-packages/dataprep/eda/create_report/formatter.py", line 297, in _format_stats "Missing Cells": float(ncells - npresent_cells), TypeError: float() argument must be a string or a number, not 'Delayed'

Desktop (please complete the following information):

  • OS: ubuntu
  • Browser : chrome
  • Platform Jupyter Lab,
  • Dataprep Version [ 0.2.2]

Additional context Add any other context about the problem here.

ssenathi avatar Sep 13 '20 08:09 ssenathi

also. Dask Dataframe works fine with Plot.

ssenathi avatar Sep 13 '20 08:09 ssenathi

Hi @ssenathi, our last released version only supports Pandas dataframes. create_report on the develop branch supports Dask dataframes, and will be available in the next release. You could try using the develop branch with pip install git+https://github.com/sfu-db/dataprep.git@develop

brandonlockhart avatar Sep 13 '20 17:09 brandonlockhart