valitron icon indicating copy to clipboard operation
valitron copied to clipboard

Alpha validation with accented characters?

Open moustacheful opened this issue 11 years ago • 4 comments

Hey! I'm just wondering -- I don't really know if there's something I'm missing or if it's a bug, but there seems to be problems when trying to validate alpha/alphanumeric strings that contain accented characters.

require 'lib/Valitron/Validator.php';
use Valitron\Validator as Validator;
$arr = [
    "last_name" => "áéíñ"
];
$v = new Validator($arr);
$v->rule('alpha','last_name');
$v->validate();
var_dump($v->errors()); // prints array(1) { ["last_name"]=> array(1) { [0]=> string(39) "Last Name must contain only letters a-z" } }

Any ideas or anyhing that needs to be set up before using Valitron?

Thanks!

moustacheful avatar Jun 20 '14 03:06 moustacheful

áéíñ

This characters is not alphabetical (in english). It isn't bug. You can add a rule to check this.

ghost avatar Jun 20 '14 05:06 ghost

Thanks for reporting this. First, I would say that there is a spanish translation you can use to get spanish error messages (lang/es.php). Second, I would say that this is an interesting issue, because in spanish, those are certainly valid characters, and ones that I would want Valitron to support. I am also thinking that even for english input, someone could easily have an accent mark in their name, and I wouldn't want that to fail validation. Any thoughts on the best way forward here? I want to support this, but at the same time, we may need a new rule type like alpha_intl or something for international characters.

vlucas avatar Jun 20 '14 13:06 vlucas

I agree this should be a different rule, since you might not want unicode characters on, say, a username. I came up with this, which supports unicode letters (see http://php.net/manual/en/regexp.reference.unicode.php , http://stackoverflow.com/questions/8013897/accept-international-name-characters-in-regex).

It does support the characters - and ' -- for example: "O'Malley" or "Kennedy-Warburton"

Maybe we could remove those characters and add them optionally as a rule parameter, as this is more useful as a 'name' rule than alnum and alpha.

    require 'lib/Valitron/Validator.php';
    use Valitron\Validator as Validator;
    Valitron\Validator::addRule('alnum_unicode', function($field, $value, array $params) {
        return preg_match("/^[\s-'0-9\pL]+$/u",$value);;
    }, 'is not unicode alphanumeric.');
    Valitron\Validator::addRule('alpha_unicode', function($field, $value, array $params) {
        return preg_match("/^[\s-'\pL]+$/u",$value);;
    }, 'is not unicode alpha.');


    $arr = [
        "test_japanese" => "日本語",
        "test_russian" => "Россия",
        "test_spanish" => "España",
        "test_emoji" => "😅",
        "test_jp_alnum" => "日本語10"
    ];

    $v = new Validator($arr);
    $v->rule('alnum_unicode',array('test_japanese','test_russian','test_spanish','test_emoji','test_jp_alnum'));
    $v->rule('alpha_unicode',array('test_japanese','test_russian','test_spanish','test_emoji','test_jp_alnum'));
    $v->validate();
    var_dump($v->errors());

which prints the following

array(2) {
  ["test_emoji"]=>
  array(2) {
    [0]=>
    string(39) "Test Emoji is not unicode alphanumeric."
    [1]=>
    string(32) "Test Emoji is not unicode alpha."
  }
  ["test_jp_alnum"]=>
  array(1) {
    [0]=>
    string(35) "Test Jp Alnum is not unicode alpha."
  }
}

moustacheful avatar Jun 20 '14 14:06 moustacheful

Same problem here for greek characters. Although @moustacheful gave a nice temporary fix, that I also use for now. Any planned update?

Another problem is that php counts 2 bytes for every greek or non-latin character, so e.g 'lengthMin', 2validates to true for the string "λ". It seems to work right when I use mb_internal_encoding("UTF-8");, but I don't know if that's the proper solution. Please help, thanks.

grrnikos avatar Aug 31 '14 13:08 grrnikos