bugbug icon indicating copy to clipboard operation
bugbug copied to clipboard

[WIP] Add doc2vec similarity algorithm

Open rock420 opened this issue 4 years ago • 2 comments

Fixes: #674

Implemented the PV-DM based doc2vec similarity algorithm with cosine similarity as similarity metrics between two documents.

Todo:

  • [x] Try different other doc2vec algorithms
  • [ ] Try different other similarity metrics
  • [x] Tuning different hyperparameters
  • [ ] Discussion about the class structure

rock420 avatar Jul 02 '21 14:07 rock420

@marco-c can you guide me on why the docker build is failing?

rock420 avatar Jul 04 '21 09:07 rock420

Tested both the PV-DM and PV-DBOW doc2vec algorithm on the bug data (tuned other hyperparameters also). PV-DBOW is working better compare to PV-DM in terms of both precision and recall. Comparison of the evaluation result (for the same set of hyperparameters and same number of data) -

PV-DM PV-DBOW
Recall @ 1: 9.739% 14.285%
Recall @ 5: 10.922% 16.041%
Recall @ 10: 11.005% 16.178%
Precision @ 1: 7.373% 9.896%
Precision @ 5: 2.376% 3.397%
Precision @ 10: 1.339% 1.969%
Recall: 11.038% 16.232%
Precision: 1.983% 2.302%
MAP@k : 8.404% 11.673%

rock420 avatar Jul 08 '21 17:07 rock420

Closing for lack of updates, feel free to reopen if you still intend to work on this.

suhaibmujahid avatar Mar 10 '23 13:03 suhaibmujahid