Java-Naive-Bayes-Classifier icon indicating copy to clipboard operation
Java-Naive-Bayes-Classifier copied to clipboard

Probability Returning Infinity for most Categories

Open Nath5 opened this issue 11 years ago • 4 comments

Hello,

I know you haven't worked on this in a while but was wondering if you had any idea why I keep seeing this issue. I have added about 25 categories to the model with lots of data in each category. For the majority of the categories no matter what I feed in when I classify a chunk of text most of the categories return a probability of infinity.

ex.

Classification[ category=friends_gatherings, probability=Infinity, featureset=[ after, school, soccerabout, this, ... -- ] ]

Nath5 avatar Jan 12 '15 21:01 Nath5

I literally encountered the same issue LOL. I think it may be because he didn't do any smoothing technique.

allenanie avatar Feb 13 '15 15:02 allenanie

Hello, yes, unfortunately there is no smoothing technique applied. PROD(P(featI|cat) becomes pretty big with lots of features and categories. You can however provide your own IFeatureProbability<T, K> calculator. This requires you to provide an own Classifier<T, K> though (or to override featuresProbabilityProduct(Collection<T> features, K category) in BayesClassifier<T, K>.

ptnplanet avatar Sep 10 '15 14:09 ptnplanet

Hi all. You might want to explore the latest feature branch (feature/weight).

Take the feature weight into consideration when calculating the featureProbabilityProduct

  • Made BayesClassifier.featureProbabilityProduct public to enable other implementations to overwrite the calculation
  • By default now take the feature weight and the assumed Probability into consideration when calculating the feautersProbabilityProduct
  • Added a test to test with high number of categories

ptnplanet avatar Feb 03 '17 10:02 ptnplanet

I'm comparing the results in python with numpy and the results with this routine and are completely different. This routine definitely don't work.

barovehicles avatar Sep 14 '17 11:09 barovehicles