yannvgn

Results 3 comments of yannvgn

Hi @VinuraD, The sentences are first tokenized before being embedded. The tokenization step relies on Moses in Facebook's LASER original implementation. For portability reasons, I decided to use its Python...

Hey, I had the same issue. Running `export MACOSX_DEPLOYMENT_TARGET="10.9"` before `python setup.py install` fixed the issue for me. Source: https://stackoverflow.com/questions/56083725/macos-build-issues-lstdc-not-found-while-building-python-package Also related to https://github.com/glample/fastBPE/issues/18 I think.

Just a note: Moses tokenizer has the same behavior: ``` $ echo "தமிழ் மொழி (Tamil language) தமிழர்களினதும், தமிழ் பேசும் பலரதும் தாய்மொழி ஆகும்." | tokenizer.perl -q -no-escape -l ta தமிழ ்...