Tokenizer icon indicating copy to clipboard operation
Tokenizer copied to clipboard

correct_spaces incorrectly inserts spaces into abbreviations

Open atlijas opened this issue 2 years ago • 1 comments

Using the newest version of Tokenizer, 3.4.2:

from tokenizer import correct_spaces
>>> correct_spaces('Þarna voru t.d. tveir hundar , m.a. hundurinn hans Jóns .')
# Expected output: 'Þarna voru t.d. tveir hundar, m.a. hundurinn hans Jóns.'
# Output:          'Þarna voru t. d. tveir hundar, m. a. hundurinn hans Jóns.'

atlijas avatar Jun 15 '23 09:06 atlijas

Thanks for reporting this. We will look into this

HaukurPall avatar Aug 30 '23 18:08 HaukurPall