bugbug
bugbug copied to clipboard
[WIP] Add doc2vec similarity algorithm
Fixes: #674
Implemented the PV-DM based doc2vec similarity algorithm with cosine similarity as similarity metrics between two documents.
Todo:
- [x] Try different other
doc2vecalgorithms - [ ] Try different other similarity metrics
- [x] Tuning different hyperparameters
- [ ] Discussion about the class structure
@marco-c can you guide me on why the docker build is failing?
Tested both the PV-DM and PV-DBOW doc2vec algorithm on the bug data (tuned other hyperparameters also). PV-DBOW is working better compare to PV-DM in terms of both precision and recall.
Comparison of the evaluation result (for the same set of hyperparameters and same number of data) -
| PV-DM | PV-DBOW | |
|---|---|---|
| Recall @ 1: | 9.739% | 14.285% |
| Recall @ 5: | 10.922% | 16.041% |
| Recall @ 10: | 11.005% | 16.178% |
| Precision @ 1: | 7.373% | 9.896% |
| Precision @ 5: | 2.376% | 3.397% |
| Precision @ 10: | 1.339% | 1.969% |
| Recall: | 11.038% | 16.232% |
| Precision: | 1.983% | 2.302% |
| MAP@k : | 8.404% | 11.673% |
Closing for lack of updates, feel free to reopen if you still intend to work on this.