data-formatter
data-formatter copied to clipboard
use LabelEncoder on y values
right now we can only take in y values that are numbers (or strings that can be converted to numbers). we cannot take in strings.
LabelEncoder would let us take in strings like "Soccer Dad","Corporate Mom", etc, as the categories we are trying to classify.
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
steps for this:
- [x] create a boolean flag for whether we need to LabelEncode or not
- [x] run y column through LabelEncoder if necessary
- [x] save the mapping from LabelEncoder
- [x] in fileNames, pass along both the boolean flag labelEncoded and the mapping
- [ ] we'll probably want to print that out with some useful messages for the user
- [ ] when we write the final results to a file, check if labelEncoded is True
- [ ] if it is, just do the reverse mapping before writing to file
nvm, labelEncoder doesn't let you introspect their process. so doing it manually.