algebird icon indicating copy to clipboard operation
algebird copied to clipboard

Binary Classification Confusion Matrix and AUC Aggregators

Open richwhitjr opened this issue 8 years ago • 5 comments

Some of this works is derived from similar functions in the Spark library. For Binary Classifications tasks it will compute a confusion matrix. You can also aggregator over these confusions matrices with different thresholds to compute an Area under the Curve Stat for both PR and ROC.

richwhitjr avatar May 30 '17 19:05 richwhitjr

Codecov Report

Merging #633 into develop will increase coverage by 0.25%. The diff coverage is 92.15%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #633      +/-   ##
===========================================
+ Coverage       82%   82.25%   +0.25%     
===========================================
  Files          111      113       +2     
  Lines         5156     5207      +51     
  Branches       457      479      +22     
===========================================
+ Hits          4228     4283      +55     
+ Misses         928      924       -4
Impacted Files Coverage Δ
...com/twitter/algebird/BinaryClassificationAUC.scala 91.66% <91.66%> (ø)
...algebird/BinaryClassificationConfusionMatrix.scala 92.59% <92.59%> (ø)
...n/scala/com/twitter/algebird/SuccessibleLaws.scala 85.71% <0%> (-7.15%) :arrow_down:
...witter/algebird/util/summer/AsyncListMMapSum.scala 96.15% <0%> (-3.85%) :arrow_down:
.../main/scala/com/twitter/algebird/BloomFilter.scala 94.32% <0%> (-0.44%) :arrow_down:
.../main/scala/com/twitter/algebird/HyperLogLog.scala 92.4% <0%> (-0.4%) :arrow_down:
.../main/scala/com/twitter/algebird/Approximate.scala 90.32% <0%> (+1.61%) :arrow_up:
...ain/scala/com/twitter/algebird/AdaptiveCache.scala 77.14% <0%> (+5.71%) :arrow_up:
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update aa98422...4aaeb1c. Read the comment docs.

codecov-io avatar May 30 '17 19:05 codecov-io

Hey @richwhitjr, thank you for this! We should have some time to look today.

sritchie-stripe avatar Jun 06 '17 14:06 sritchie-stripe

@richwhitjr this is getting there - I think it's looking good.

One ask that we have for all new data structures is that you add a section with a few examples to the documentation site.

here are the instructions on how to do this: https://github.com/twitter/algebird/blob/develop/CONTRIBUTING.md#contributing-documentation

All data structure examples are compiled by CI, so it's extremely helpful to get a mini tutorial in at the same time the data structure is merged.

here are a few examples of existing pages:

  • https://twitter.github.io/algebird/datatypes/approx/exponential_histogram.html
  • https://twitter.github.io/algebird/datatypes/averaged_value.html

sritchie-stripe avatar Jun 10 '17 13:06 sritchie-stripe

I looked at adding sumOption but couldn't think of a way to make it more efficient without a lot of complexity and not sure if it is entirely worth it. It would help with object creation but on the other side we would have to aggregate the confusion matrix variables outside into some type of dictionary and reconstruct it at the end.

richwhitjr avatar Jun 12 '17 20:06 richwhitjr

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Richard Whitcomb seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant