gdxie1
gdxie1
text = "will not be the true meaning. always remember that our mind" print(moses_tokenizer.tokenize(text, escape=False)) I get the following output ['will', 'not', 'be', 'the', 'true', 'meaning.', 'always', 'remember', 'that', 'our',...
When add bpe configuration in the conf.json as follows: "tokenizer":{ "src":{ "type": "pyonmttok", "mode": null, "params":{ "bpe_model_path": "bpe/en.model" } }, "tgt":{ "type": "pyonmttok", "mode": null, "params":{ "bpe_model_path": "bpe/ga.model" } }...