handling of dtype object
I am getting an exception due to columns having the type object.
After converting to type "category", the exception is gone.
Expected behavior:
- do conversion under the hood or do dtype check and error message
"---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
redacted/evidently/dashboard/dashboard.py in calculate(self, reference_data, current_data, column_mapping) 140 current_data: pandas.DataFrame, 141 column_mapping: dict = None): --> 142 self.execute(reference_data, current_data, column_mapping) 143 for tab in self.tabsData: 144 tab.calculate(reference_data, current_data, column_mapping, self.analyzers_results) redacted/evidently/pipeline/pipeline.py in execute(self, reference_data, current_data, column_mapping) 16 column_mapping: dict = None): 17 for analyzer in self.get_analyzers(): ---> 18 self.analyzers_results[analyzer] = analyzer().calculate(reference_data, current_data, column_mapping)
redacted/evidently/analyzers/data_drift_analyzer.py in calculate(self, reference_data, current_data, column_mapping) 81 82 for feature_name in cat_feature_names: ---> 83 ref_feature_vc = reference_data[feature_name][np.isfinite(reference_data[feature_name])].value_counts() 84 current_feature_vc = current_data[feature_name][np.isfinite(current_data[feature_name])].value_counts() 85
redacted/pandas/core/series.py in array_ufunc(self, ufunc, method, *inputs, **kwargs) 724 725 inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs) --> 726 result = getattr(ufunc, method)(*inputs, **kwargs) 727 728 name = names[0] if len(set(names)) == 1 else None
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Hi @rmminusrslash Thanks for reporting - that is indeed important! We will definitely add a more actionable error message. Added this to the near term actions.
We are also considering adding a more automatic way (may be detect a feature type automatically and have different processing logic depending on a type) of handling columns with object type later on.
Hi @emeli-dral,
I'm running into issues with this on version 0.1.30.dev0. In particular, I have a feature of type boolean and am using column_mapping=None. Would it be possible to - at least - ignore any features for which you cannot determine/handle the data type? This way, one could still obtain a report for the other features.
Hi @BeLitz! Thanks for sharing, and apologies for the delay in response.
We decided against silently excluding the undetermined features in this case for the following reason: if this happens, and you get no alert - you might not notice that something is wrong. But we aim to fix it 🙂 Could you share the exact type of the boolean feature you worked with (Python type: int "0/1", boolean "true/false", string "yes/no")? We will make sure it is fixed in the next release.
Thanks @emeli-dral. The datatype was boolean "true/false". Right, silently excluding would not work well, but you could add a field to the json response, which mentions any problems or omissions.
@BeLitz, that makes sense. We are adding alert functionality in one of the next few releases as well as boolean data processing. We have a specific statistical test for binary data, the boolean data will be covered by this test as well.
Reports should be working fine with boolean data type as of now.