php-sentence
php-sentence copied to clipboard
Acronyms at the end of sentence are incorrectly parsed
The library has been really useful to us to break text into sentences. I've noticed one issue so far. Seems like if a sentence ends with an acronym at the end of the text, everything is okay, but if there's another sentence after it - it gives an incorrect result. It goes even worse if the acronym is capitalized.
Here it works fine:
$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m..', \Sentence::SPLIT_TRIM);
var_dump($sentences);
array(1) {
[0] =>
string(25) "Let's meet at 10:00 a.m.."
}
But fails in this one:
$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 a.m.. How about Greg?', \Sentence::SPLIT_TRIM);
var_dump($sentences);
array(2) {
[0] =>
string(22) "Let's meet at 10:00 a."
[1] =>
string(19) "m.. How about Greg?"
}
Here it fails with a capitalized acronym:
$sentences = $sentenceBreaker->split('Let\'s meet at 10:00 A.M.. How about Greg?', \Sentence::SPLIT_TRIM);
var_dump($sentences);
array(1) {
[0] =>
string(41) "Let's meet at 10:00 A.M.. How about Greg?"
}
@vanderlee would you be able to have a look into this one, please?