preprocess
preprocess copied to clipboard
truecaser not identical to perl script
On input -> the Moses truecase script does - > but the C++ does ->. The additional space seems to appear regardless of what is before >.
But the tokenizer is supposed to change those to < and > so it probably doesn't matter. (XML support is out of scope for the C++ version)