Probability Returning Infinity for most Categories
Hello,
I know you haven't worked on this in a while but was wondering if you had any idea why I keep seeing this issue. I have added about 25 categories to the model with lots of data in each category. For the majority of the categories no matter what I feed in when I classify a chunk of text most of the categories return a probability of infinity.
ex.
Classification[ category=friends_gatherings, probability=Infinity, featureset=[ after, school, soccerabout, this, ... -- ] ]
I literally encountered the same issue LOL. I think it may be because he didn't do any smoothing technique.
Hello, yes, unfortunately there is no smoothing technique applied. PROD(P(featI|cat) becomes pretty big with lots of features and categories. You can however provide your own IFeatureProbability<T, K> calculator. This requires you to provide an own Classifier<T, K> though (or to override featuresProbabilityProduct(Collection<T> features, K category) in BayesClassifier<T, K>.
Hi all. You might want to explore the latest feature branch (feature/weight).
Take the feature weight into consideration when calculating the featureProbabilityProduct
- Made BayesClassifier.featureProbabilityProduct public to enable other implementations to overwrite the calculation
- By default now take the feature weight and the assumed Probability into consideration when calculating the feautersProbabilityProduct
- Added a test to test with high number of categories
I'm comparing the results in python with numpy and the results with this routine and are completely different. This routine definitely don't work.