machine-learning icon indicating copy to clipboard operation
machine-learning copied to clipboard

Translate Data to ARFF Format

Open mbernste opened this issue 12 years ago • 1 comments

Create script to translate data sets to ARFF format where continuous attributes are binned and missing values are handled (either imputed using expectation-maximization or simply discarded).

mbernste avatar Dec 08 '13 21:12 mbernste

I finished the script. I put it in data/src/data/arff/. Things I did that are open for discussion:

  • If an instance has a missing value, that instance is discarded (if we want to impute, this has to be done after the net is created. I think..)
  • Bins are only created if a feature has 15 or more unique, numeric values
  • 4 - 5 bins are created (depends on the number of unique values for the feature)
  • Bins are named 'X_Y' where X and Y are the range values of the bin

schulzca avatar Dec 09 '13 04:12 schulzca