sweetviz icon indicating copy to clipboard operation
sweetviz copied to clipboard

Wrong values of % target

Open sebastien-foulle opened this issue 4 years ago • 2 comments

Hello,

the html report produced by the following script shows that if bill_length_mm <= 35 then % target < 90%, and if 35 <= bill_length_mm <= 37.5 then % target > 105% (!).

image

import pandas as pd
from palmerpenguins import load_penguins
import sweetviz as sv
penguins = load_penguins()
penguins["target"] = penguins.species == 'Adelie'
penguins = penguins[["species", "bill_length_mm", "target"]]
penguins.head()

my_report = sv.analyze(penguins, target_feat = "target")
my_report.show_html()

But in fact if bill_length_mm <= 40, % target should always be 100% : there are only Adelie penguins in this case.

# Adelie    100
penguins.query('bill_length_mm <= 40').species.value_counts()

Maybe it's a rounding problem.

sebastien-foulle avatar Apr 22 '22 18:04 sebastien-foulle

@sebastien-foulle thank you for reporting this, I will take a look!

fbdesignpro avatar May 04 '22 20:05 fbdesignpro

I am experiencing a same event. How is the progress of the investigation and fix here?

makotu1208 avatar May 25 '22 23:05 makotu1208

I have a similar issue! Attached is the example_data.pkl file, example_data.pkl.zip

The code to reproduce the result:

feature_config = sv.FeatureConfig(force_cat=['numerical_var'])
correct_report = sv.analyze([example_data, 'Train'],
                             target_feat='outcome', 
                             feat_cfg=feature_config,
                             pairwise_analysis='off')
correct_report.show_html('correct_report.html')

feature_config = sv.FeatureConfig(force_num=['numerical_var'])
wrong_report = sv.analyze([example_data, 'Train'],
                           target_feat='outcome', 
                           feat_cfg=feature_config,
                           pairwise_analysis='off')
wrong_report.show_html('wrong_report.html')

When we force_cat the numerical_var, we can get the correct distribution of the outcome:

correct_need_to_force_cat

If we force_num the numerical_var, the outcome distribution is completely off:

wrong_as_numerical

cwzkevin avatar Dec 08 '22 17:12 cwzkevin

Fixed by 2ec0848!

fbdesignpro avatar Nov 14 '23 23:11 fbdesignpro