vecmap Very different result with the paper

Dear Mikel,

Thank you for sharing your great work with us.

I'm running your codes and trying to reproduce the result that you reported in your ACL 2018 paper. But I could not get a comparable result.

I got all required datasets and embedding file by ./get_data.sh and used them to train the model by python3 map_embeddings.py --acl2018 --cuda SRC.EMB TRG.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB

The results you reported in the paper are: 48.13 for EN-IT, 48.19 for EN-DE, 32.63 for EN-FI and 37.33 for EN-ES respectively. However, I got the results for 4 language pair are: 21.04 for EN-IT, 38.6 for EN-DE, 18.64 for EN-FI and 12.68 for EN-ES respectively. My evaluation code is: python3 eval_translation.py SRC_MAPPED.EMB TRG_MAPPED.EMB -d TEST.DICT --retrieval csls

My results are nearly only one half of what you reported. But I totally don't know why. Could you help me? Thank you very much!

Nov 11 '19 12:11 lipingtang17

I have another question about the evaluation policy. You calculated the coverage, that is the percentage of test words that are in the cutoff vocabulary. Among these "in-vocabulary" words, the percentage of correctly predicted word pair is calculated as the accuracy. So are those "out-of-vocabulary" words which occurred in the test ignored? is it reasonable or is it the common way that the community use? Looking forward to your reply! Thanks very much!

Nov 11 '19 12:11 lipingtang17

You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.

Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).

Nov 11 '19 15:11 artetxem

You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.

Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).

I reversed the embedding files and it works now! Thank you very much!

Nov 12 '19 03:11 lipingtang17

You can close this issue. Nah?

Dec 17 '22 23:12 TheShayegh