Different results for Title Case
The algorithm is giving different results for 'office' (as 'offic') and 'Office' (as 'office').
I don't know if this is a part of implementation but for this case, the stemmed versions should not be different. This difference is caused probably by the isVowel function that operates on vowel set of (a,e,i,o,u,y) and not (A,E,I,O,U,Y). I modified the function isVowel
protected static function isVowel($position, $word, array $additional = array()) {
$vowels = array_merge(array('a', 'e', 'i', 'o', 'u', 'y'), $additional); //lowercase vowels only
return in_array(self::charAt($position, $word), $vowels);
}
to operatore on $word in lowercase:
protected static function isVowel($position, $word, array $additional = array()) {
$vowels = array_merge(array('a', 'e', 'i', 'o', 'u', 'y'), $additional); //lowercase vowels only
return in_array(self::charAt($position, strtolower($word)), $vowels); //changed parameter word to lowercase
}
This had solved this issue.
There is different output for 'Bearing' and 'bearing' and that is (obviously) not solved by this modification. This needs another modification and before I post that, I may need insights of OP or programmers having decent knowledge of PorterStemmer2 on whether this is in accordance with the algorithm or not.
Well, to the programmers using this library - convert string to lowercase before passing into the entry function.