bertalign icon indicating copy to clipboard operation
bertalign copied to clipboard

more languages

Open rohlik-hu opened this issue 11 months ago • 2 comments

Thanks a lot for the impressive tool. How can additional languages be included? It seems that the sentence-transformers library supports many more …

rohlik-hu avatar Mar 03 '25 12:03 rohlik-hu

Thank you for your interest in Bertalign! The LaBSE model supports over 100 languages. However, Bertalign relies on sentence-splitter for sentence segmentation, which currently supports only 25 languages.

If you need to align other languages, you might consider using alternative sentence segmentation tools such as pySBD and Ersatz. These tools offer broader language support and may better suit your needs.

bfsujason avatar Mar 07 '25 05:03 bfsujason

Hi, my data has already undergone sentence splitting. How would I simply skip the sentence-splitting step?

Thanks for your work!!

rvwfels avatar Jun 19 '25 04:06 rvwfels