bugbug
bugbug copied to clipboard
Add a similarity feature for duplicate classifier
Fixes : #582
Caveat : StructuredColumnTransformer converts sparse matrices to dense matrices, increasing the memory size of the inputs. (https://github.com/mozilla/bugbug/blob/master/bugbug/utils.py#L44). Thus, I haven't been able to test the performance of this model so far.
Codecov Report
Merging #616 into master will decrease coverage by
0.06%. The diff coverage is51.72%.
@@ Coverage Diff @@
## master #616 +/- ##
==========================================
- Coverage 58.29% 58.23% -0.07%
==========================================
Files 58 58
Lines 3690 3718 +28
==========================================
+ Hits 2151 2165 +14
- Misses 1539 1553 +14
| Impacted Files | Coverage Δ | |
|---|---|---|
| bugbug/models/duplicate.py | 35.10% <51.72%> (+6.31%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 938eb29...b38f498. Read the comment docs.