NaiveBayesClassifier
NaiveBayesClassifier copied to clipboard
clarirication on a method
hi, i know that this project was a demo and it's not mantained anymore, but maybe you are up for a little clarification :)
my question is, in FeatureExtraction i see
for(Map.Entry<String, Integer> entry : doc.tokens.entrySet()) {
feature = entry.getKey();
//get the counts of the feature in the categories
featureCategoryCounts = stats.featureCategoryJointCount.get(feature);
if(featureCategoryCounts==null) {
//initialize it if it does not exist
stats.featureCategoryJointCount.put(feature, new HashMap<String, Integer>());
}
featureCategoryCount=stats.featureCategoryJointCount.get(feature).get(category);
if(featureCategoryCount==null) {
featureCategoryCount=0;
}
//increase the number of occurrences of the feature in the category
stats.featureCategoryJointCount.get(feature).put(category, ++featureCategoryCount);
}
i am not 100% understanding why we need to recalculate the number here, as doc.token is already a map of string.integer and we are iterating on the entryset just for getting the key. why we don't need the values? if a string is present 10 times why we don't need to take it into consideration?
shouldn't be featureCategoryCount+=entry.getvalue?