Question about learnbpe
Hi,
I have a question about the learnbpe operation. The example in the README.md learn bpecodes together for en and de, and then apply code for en and de separately..
./fast learnbpe 40000 train.de train.en > codes
./fast applybpe train.de.40000 train.de codes
./fast applybpe train.en.40000 train.en codes
Here is my question:
-
What's the purpose of jointly learning bpe cde for
enandde? If in the NMT system, whichenanddewill not share embedding. Is it more reasonable to learn bpe code forenanddeseparately ? -
What's the different between the number 40000 in
learnbpeandapplybpe?
Thanks~
-
Jointly learning the code is mostly useful when you share the embeddings. It's good because it helps the model handling rare words like named entities very easily. Even if you don't share, I would still learn them jointly. At the very least you save GPU memory.
-
I'm not sure I understand your question. In
learnbpe, 40000 is the number of codes you want to learn. Inapplybpeyou don't have to provide 40000, just thecodesfile.