data-formatter icon indicating copy to clipboard operation
data-formatter copied to clipboard

use LabelEncoder on y values

Open ClimbsRocks opened this issue 9 years ago • 3 comments

right now we can only take in y values that are numbers (or strings that can be converted to numbers). we cannot take in strings.

LabelEncoder would let us take in strings like "Soccer Dad","Corporate Mom", etc, as the categories we are trying to classify.

ClimbsRocks avatar Mar 19 '16 19:03 ClimbsRocks

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

ClimbsRocks avatar Mar 19 '16 19:03 ClimbsRocks

steps for this:

  • [x] create a boolean flag for whether we need to LabelEncode or not
  • [x] run y column through LabelEncoder if necessary
  • [x] save the mapping from LabelEncoder
  • [x] in fileNames, pass along both the boolean flag labelEncoded and the mapping
  • [ ] we'll probably want to print that out with some useful messages for the user
  • [ ] when we write the final results to a file, check if labelEncoded is True
  • [ ] if it is, just do the reverse mapping before writing to file

ClimbsRocks avatar Mar 19 '16 19:03 ClimbsRocks

nvm, labelEncoder doesn't let you introspect their process. so doing it manually.

ClimbsRocks avatar Mar 19 '16 21:03 ClimbsRocks