evidently icon indicating copy to clipboard operation
evidently copied to clipboard

handling of dtype object

Open rmminusrslash opened this issue 4 years ago • 5 comments

I am getting an exception due to columns having the type object.

After converting to type "category", the exception is gone.

Expected behavior:

  • do conversion under the hood or do dtype check and error message

"--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 1 report = Dashboard(tabs=[DataDriftTab]) 2 report.calculate(df_feb[MODEL_COLS].sample(10000), march_df[MODEL_COLS].sample(10000), ----> 3 column_mapping = None)

redacted/evidently/dashboard/dashboard.py in calculate(self, reference_data, current_data, column_mapping) 140 current_data: pandas.DataFrame, 141 column_mapping: dict = None): --> 142 self.execute(reference_data, current_data, column_mapping) 143 for tab in self.tabsData: 144 tab.calculate(reference_data, current_data, column_mapping, self.analyzers_results) redacted/evidently/pipeline/pipeline.py in execute(self, reference_data, current_data, column_mapping) 16 column_mapping: dict = None): 17 for analyzer in self.get_analyzers(): ---> 18 self.analyzers_results[analyzer] = analyzer().calculate(reference_data, current_data, column_mapping)

redacted/evidently/analyzers/data_drift_analyzer.py in calculate(self, reference_data, current_data, column_mapping) 81 82 for feature_name in cat_feature_names: ---> 83 ref_feature_vc = reference_data[feature_name][np.isfinite(reference_data[feature_name])].value_counts() 84 current_feature_vc = current_data[feature_name][np.isfinite(current_data[feature_name])].value_counts() 85

redacted/pandas/core/series.py in array_ufunc(self, ufunc, method, *inputs, **kwargs) 724 725 inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs) --> 726 result = getattr(ufunc, method)(*inputs, **kwargs) 727 728 name = names[0] if len(set(names)) == 1 else None

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

rmminusrslash avatar Aug 11 '21 17:08 rmminusrslash

Hi @rmminusrslash Thanks for reporting - that is indeed important! We will definitely add a more actionable error message. Added this to the near term actions.

We are also considering adding a more automatic way (may be detect a feature type automatically and have different processing logic depending on a type) of handling columns with object type later on.

emeli-dral avatar Aug 19 '21 12:08 emeli-dral

Hi @emeli-dral,

I'm running into issues with this on version 0.1.30.dev0. In particular, I have a feature of type boolean and am using column_mapping=None. Would it be possible to - at least - ignore any features for which you cannot determine/handle the data type? This way, one could still obtain a report for the other features.

BeLitz avatar Nov 12 '21 23:11 BeLitz

Hi @BeLitz! Thanks for sharing, and apologies for the delay in response.

We decided against silently excluding the undetermined features in this case for the following reason: if this happens, and you get no alert - you might not notice that something is wrong. But we aim to fix it 🙂 Could you share the exact type of the boolean feature you worked with (Python type: int "0/1", boolean "true/false", string "yes/no")? We will make sure it is fixed in the next release.

emeli-dral avatar Dec 01 '21 18:12 emeli-dral

Thanks @emeli-dral. The datatype was boolean "true/false". Right, silently excluding would not work well, but you could add a field to the json response, which mentions any problems or omissions.

BeLitz avatar Dec 21 '21 18:12 BeLitz

@BeLitz, that makes sense. We are adding alert functionality in one of the next few releases as well as boolean data processing. We have a specific statistical test for binary data, the boolean data will be covered by this test as well.

emeli-dral avatar Dec 31 '21 15:12 emeli-dral

Reports should be working fine with boolean data type as of now.

emeli-dral avatar Sep 21 '23 13:09 emeli-dral