php-sentence icon indicating copy to clipboard operation
php-sentence copied to clipboard

Acronyms at the end of sentence are incorrectly parsed

Open dinamic opened this issue 6 years ago • 1 comments

The library has been really useful to us to break text into sentences. I've noticed one issue so far. Seems like if a sentence ends with an acronym at the end of the text, everything is okay, but if there's another sentence after it - it gives an incorrect result. It goes even worse if the acronym is capitalized.

Here it works fine:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m..', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(1) {
  [0] =>
  string(25) "Let's meet at 10:00 a.m.."
}

But fails in this one:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m.. How about Greg?', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(2) {
  [0] =>
  string(22) "Let's meet at 10:00 a."
  [1] =>
  string(19) "m.. How about Greg?"
}

Here it fails with a capitalized acronym:

$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 A.M.. How about Greg?', \Sentence::SPLIT_TRIM);

var_dump($sentences);
array(1) {
  [0] =>
  string(41) "Let's meet at 10:00 A.M.. How about Greg?"
}

dinamic avatar Feb 20 '20 14:02 dinamic

@vanderlee would you be able to have a look into this one, please?

dinamic avatar Feb 20 '20 14:02 dinamic