dataprep icon indicating copy to clipboard operation
dataprep copied to clipboard

Data type detection: integer column with small distinct values as categorical

Open jinglinpeng opened this issue 5 years ago • 2 comments

Is your feature request related to a problem? Please describe. Currently eda.plot detects a column type based on its pandas dataframe type. Sometimes this may not be ideal. For example, in a dataset the gender column may contain two values: 0 for male and 1 for female. And this will be detected as numerical column, while categorical column makes more sense.

Describe the solution you'd like As a start point, we could handle some simple cases. For example, when a column's dataframe type is integer and its distinct values are smaller than a threshold (we can use the default displayed bars as the threshold), then we detect it as categorical column.

jinglinpeng avatar Jul 09 '20 16:07 jinglinpeng

related to #99

brandonlockhart avatar Jul 11 '20 21:07 brandonlockhart

  • [ ] Add code to detect ordinal types indtypes.py
  • [ ] Handle ordinal plotting through the EDA module.

dovahcrow avatar Jul 13 '20 22:07 dovahcrow