Very different result with the paper
Dear Mikel,
Thank you for sharing your great work with us.
I'm running your codes and trying to reproduce the result that you reported in your ACL 2018 paper. But I could not get a comparable result.
I got all required datasets and embedding file by ./get_data.sh and used them to train the model by python3 map_embeddings.py --acl2018 --cuda SRC.EMB TRG.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB
The results you reported in the paper are: 48.13 for EN-IT, 48.19 for EN-DE, 32.63 for EN-FI and 37.33 for EN-ES respectively. However, I got the results for 4 language pair are: 21.04 for EN-IT, 38.6 for EN-DE, 18.64 for EN-FI and 12.68 for EN-ES respectively. My evaluation code is: python3 eval_translation.py SRC_MAPPED.EMB TRG_MAPPED.EMB -d TEST.DICT --retrieval csls
My results are nearly only one half of what you reported. But I totally don't know why. Could you help me? Thank you very much!
I have another question about the evaluation policy. You calculated the coverage, that is the percentage of test words that are in the cutoff vocabulary. Among these "in-vocabulary" words, the percentage of correctly predicted word pair is calculated as the accuracy. So are those "out-of-vocabulary" words which occurred in the test ignored? is it reasonable or is it the common way that the community use? Looking forward to your reply! Thanks very much!
You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.
Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).
You must be doing something wrong. It might be that you are using the test dictionary in the reverse direction. In that case simply swap src_mapped.emb and trg_mapped.emb when calling the evaluation script.
Also, you should get 100% coverage if you are using the provided data. If not, you are definitely doing something wrong (encoding issues or using the test dictionary in the reverse direction are the only things that come to mind).
I reversed the embedding files and it works now! Thank you very much!
You can close this issue. Nah?