evidently handling of dtype object

I am getting an exception due to columns having the type object.

After converting to type "category", the exception is gone.

Expected behavior:

do conversion under the hood or do dtype check and error message

"--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 1 report = Dashboard(tabs=[DataDriftTab]) 2 report.calculate(df_feb[MODEL_COLS].sample(10000), march_df[MODEL_COLS].sample(10000), ----> 3 column_mapping = None)

redacted/evidently/dashboard/dashboard.py in calculate(self, reference_data, current_data, column_mapping) 140 current_data: pandas.DataFrame, 141 column_mapping: dict = None): --> 142 self.execute(reference_data, current_data, column_mapping) 143 for tab in self.tabsData: 144 tab.calculate(reference_data, current_data, column_mapping, self.analyzers_results) redacted/evidently/pipeline/pipeline.py in execute(self, reference_data, current_data, column_mapping) 16 column_mapping: dict = None): 17 for analyzer in self.get_analyzers(): ---> 18 self.analyzers_results[analyzer] = analyzer().calculate(reference_data, current_data, column_mapping)

redacted/evidently/analyzers/data_drift_analyzer.py in calculate(self, reference_data, current_data, column_mapping) 81 82 for feature_name in cat_feature_names: ---> 83 ref_feature_vc = reference_data[feature_name][np.isfinite(reference_data[feature_name])].value_counts() 84 current_feature_vc = current_data[feature_name][np.isfinite(current_data[feature_name])].value_counts() 85

redacted/pandas/core/series.py in array_ufunc(self, ufunc, method, *inputs, **kwargs) 724 725 inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs) --> 726 result = getattr(ufunc, method)(*inputs, **kwargs) 727 728 name = names[0] if len(set(names)) == 1 else None

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Aug 11 '21 17:08 rmminusrslash

Hi @rmminusrslash Thanks for reporting - that is indeed important! We will definitely add a more actionable error message. Added this to the near term actions.

We are also considering adding a more automatic way (may be detect a feature type automatically and have different processing logic depending on a type) of handling columns with object type later on.

Aug 19 '21 12:08 emeli-dral

Hi @emeli-dral,

I'm running into issues with this on version 0.1.30.dev0. In particular, I have a feature of type boolean and am using column_mapping=None. Would it be possible to - at least - ignore any features for which you cannot determine/handle the data type? This way, one could still obtain a report for the other features.

Nov 12 '21 23:11 BeLitz

Hi @BeLitz! Thanks for sharing, and apologies for the delay in response.

We decided against silently excluding the undetermined features in this case for the following reason: if this happens, and you get no alert - you might not notice that something is wrong. But we aim to fix it 🙂 Could you share the exact type of the boolean feature you worked with (Python type: int "0/1", boolean "true/false", string "yes/no")? We will make sure it is fixed in the next release.

Dec 01 '21 18:12 emeli-dral

Thanks @emeli-dral. The datatype was boolean "true/false". Right, silently excluding would not work well, but you could add a field to the json response, which mentions any problems or omissions.

Dec 21 '21 18:12 BeLitz

@BeLitz, that makes sense. We are adding alert functionality in one of the next few releases as well as boolean data processing. We have a specific statistical test for binary data, the boolean data will be covered by this test as well.

Dec 31 '21 15:12 emeli-dral

Reports should be working fine with boolean data type as of now.

Sep 21 '23 13:09 emeli-dral