dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Using percentages instead of counts to compare distribution of two tables

Open borisRa opened this issue 4 years ago • 2 comments

Hi,

How can I compare between train/test distributions ? Using this code : plot_diff([train_df[train_df.columns[~train_df.columns.isin(['Survived'])]], test_df],config={"diff.label": ["train_df", "test_df"]})

I am getting counts as is , I would like to compare percentage instead. Similar to this plot for Age distribution : image

Thanks ! Boris

borisRa avatar Feb 22 '22 15:02 borisRa

Hi @borisRa , thanks for proposing the issue. Will diff.density=True works for you? (related: https://github.com/sfu-db/dataprep/pull/698)

jinglinpeng avatar Feb 24 '22 03:02 jinglinpeng

Hi @borisRa , thanks for proposing the issue. Will diff.density=True works for you? (related: #698)

nope . should be similar to the plot above to be able to compare distributions and not counts

borisRa avatar Feb 24 '22 09:02 borisRa