nslt icon indicating copy to clipboard operation
nslt copied to clipboard

Problems about the vocabulary of gloss in Weather 2014 T

Open PanXiebit opened this issue 5 years ago • 0 comments

Hi @neccam , I am confused about the process of the training vocabulary.

  1. Words containing the symbol "__" in training corpus("phoenix2014T.train.gloss") have not appeared in the dev/test gloss corpus. Especially "__ON __", "__OFF__", they are very common in training corpus, but never appear in training corpus. Can I delete it directly?

  2. The size of the vocabulary obtained from training corpus is 1232, but in the paper it is 1066. Is there any preprocessing here?

PanXiebit avatar Jun 25 '20 16:06 PanXiebit