NaiveBayesClassifier icon indicating copy to clipboard operation
NaiveBayesClassifier copied to clipboard

clarirication on a method

Open ramarro123 opened this issue 2 years ago • 0 comments

hi, i know that this project was a demo and it's not mantained anymore, but maybe you are up for a little clarification :)

my question is, in FeatureExtraction i see

           for(Map.Entry<String, Integer> entry : doc.tokens.entrySet()) {
                feature = entry.getKey();
                
                //get the counts of the feature in the categories
                featureCategoryCounts = stats.featureCategoryJointCount.get(feature);
                if(featureCategoryCounts==null) { 
                    //initialize it if it does not exist
                    stats.featureCategoryJointCount.put(feature, new HashMap<String, Integer>());
                }
                
                featureCategoryCount=stats.featureCategoryJointCount.get(feature).get(category);
                if(featureCategoryCount==null) {
                    featureCategoryCount=0;
                }
                
                //increase the number of occurrences of the feature in the category
                stats.featureCategoryJointCount.get(feature).put(category, ++featureCategoryCount);
            }

i am not 100% understanding why we need to recalculate the number here, as doc.token is already a map of string.integer and we are iterating on the entryset just for getting the key. why we don't need the values? if a string is present 10 times why we don't need to take it into consideration?

shouldn't be featureCategoryCount+=entry.getvalue?

ramarro123 avatar Oct 07 '23 12:10 ramarro123