Alpha validation with accented characters?
Hey! I'm just wondering -- I don't really know if there's something I'm missing or if it's a bug, but there seems to be problems when trying to validate alpha/alphanumeric strings that contain accented characters.
require 'lib/Valitron/Validator.php';
use Valitron\Validator as Validator;
$arr = [
"last_name" => "áéíñ"
];
$v = new Validator($arr);
$v->rule('alpha','last_name');
$v->validate();
var_dump($v->errors()); // prints array(1) { ["last_name"]=> array(1) { [0]=> string(39) "Last Name must contain only letters a-z" } }
Any ideas or anyhing that needs to be set up before using Valitron?
Thanks!
áéíñ
This characters is not alphabetical (in english). It isn't bug. You can add a rule to check this.
Thanks for reporting this. First, I would say that there is a spanish translation you can use to get spanish error messages (lang/es.php). Second, I would say that this is an interesting issue, because in spanish, those are certainly valid characters, and ones that I would want Valitron to support. I am also thinking that even for english input, someone could easily have an accent mark in their name, and I wouldn't want that to fail validation. Any thoughts on the best way forward here? I want to support this, but at the same time, we may need a new rule type like alpha_intl or something for international characters.
I agree this should be a different rule, since you might not want unicode characters on, say, a username. I came up with this, which supports unicode letters (see http://php.net/manual/en/regexp.reference.unicode.php , http://stackoverflow.com/questions/8013897/accept-international-name-characters-in-regex).
It does support the characters - and ' -- for example: "O'Malley" or "Kennedy-Warburton"
Maybe we could remove those characters and add them optionally as a rule parameter, as this is more useful as a 'name' rule than alnum and alpha.
require 'lib/Valitron/Validator.php';
use Valitron\Validator as Validator;
Valitron\Validator::addRule('alnum_unicode', function($field, $value, array $params) {
return preg_match("/^[\s-'0-9\pL]+$/u",$value);;
}, 'is not unicode alphanumeric.');
Valitron\Validator::addRule('alpha_unicode', function($field, $value, array $params) {
return preg_match("/^[\s-'\pL]+$/u",$value);;
}, 'is not unicode alpha.');
$arr = [
"test_japanese" => "日本語",
"test_russian" => "Россия",
"test_spanish" => "España",
"test_emoji" => "😅",
"test_jp_alnum" => "日本語10"
];
$v = new Validator($arr);
$v->rule('alnum_unicode',array('test_japanese','test_russian','test_spanish','test_emoji','test_jp_alnum'));
$v->rule('alpha_unicode',array('test_japanese','test_russian','test_spanish','test_emoji','test_jp_alnum'));
$v->validate();
var_dump($v->errors());
which prints the following
array(2) {
["test_emoji"]=>
array(2) {
[0]=>
string(39) "Test Emoji is not unicode alphanumeric."
[1]=>
string(32) "Test Emoji is not unicode alpha."
}
["test_jp_alnum"]=>
array(1) {
[0]=>
string(35) "Test Jp Alnum is not unicode alpha."
}
}
Same problem here for greek characters. Although @moustacheful gave a nice temporary fix, that I also use for now. Any planned update?
Another problem is that php counts 2 bytes for every greek or non-latin character, so e.g 'lengthMin', 2validates to true for the string "λ". It seems to work right when I use mb_internal_encoding("UTF-8");, but I don't know if that's the proper solution. Please help, thanks.