Binary Classification Confusion Matrix and AUC Aggregators
Some of this works is derived from similar functions in the Spark library. For Binary Classifications tasks it will compute a confusion matrix. You can also aggregator over these confusions matrices with different thresholds to compute an Area under the Curve Stat for both PR and ROC.
Codecov Report
Merging #633 into develop will increase coverage by
0.25%. The diff coverage is92.15%.
@@ Coverage Diff @@
## develop #633 +/- ##
===========================================
+ Coverage 82% 82.25% +0.25%
===========================================
Files 111 113 +2
Lines 5156 5207 +51
Branches 457 479 +22
===========================================
+ Hits 4228 4283 +55
+ Misses 928 924 -4
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...com/twitter/algebird/BinaryClassificationAUC.scala | 91.66% <91.66%> (ø) |
|
| ...algebird/BinaryClassificationConfusionMatrix.scala | 92.59% <92.59%> (ø) |
|
| ...n/scala/com/twitter/algebird/SuccessibleLaws.scala | 85.71% <0%> (-7.15%) |
:arrow_down: |
| ...witter/algebird/util/summer/AsyncListMMapSum.scala | 96.15% <0%> (-3.85%) |
:arrow_down: |
| .../main/scala/com/twitter/algebird/BloomFilter.scala | 94.32% <0%> (-0.44%) |
:arrow_down: |
| .../main/scala/com/twitter/algebird/HyperLogLog.scala | 92.4% <0%> (-0.4%) |
:arrow_down: |
| .../main/scala/com/twitter/algebird/Approximate.scala | 90.32% <0%> (+1.61%) |
:arrow_up: |
| ...ain/scala/com/twitter/algebird/AdaptiveCache.scala | 77.14% <0%> (+5.71%) |
:arrow_up: |
| ... and 2 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update aa98422...4aaeb1c. Read the comment docs.
Hey @richwhitjr, thank you for this! We should have some time to look today.
@richwhitjr this is getting there - I think it's looking good.
One ask that we have for all new data structures is that you add a section with a few examples to the documentation site.
here are the instructions on how to do this: https://github.com/twitter/algebird/blob/develop/CONTRIBUTING.md#contributing-documentation
All data structure examples are compiled by CI, so it's extremely helpful to get a mini tutorial in at the same time the data structure is merged.
here are a few examples of existing pages:
- https://twitter.github.io/algebird/datatypes/approx/exponential_histogram.html
- https://twitter.github.io/algebird/datatypes/averaged_value.html
I looked at adding sumOption but couldn't think of a way to make it more efficient without a lot of complexity and not sure if it is entirely worth it. It would help with object creation but on the other side we would have to aggregate the confusion matrix variables outside into some type of dictionary and reconstruct it at the end.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
Richard Whitcomb seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.