lexmachine icon indicating copy to clipboard operation
lexmachine copied to clipboard

Add support for modes such as `(?i)`

Open exyzzy opened this issue 6 years ago • 5 comments

Trying in lexer.go to parse: lexer.Add([]byte((?i)(varchar\([0-9]+\))), token("VARCHARID"))

results: (debug=true) ... 01/14 20:10:48 enter alternation 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOps 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOp 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomic 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter char 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter CHAR 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit CHAR 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter charRange 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit char 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 char Regex parse error in production 'charClass' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : expected '[' at 0 got '(' of '(?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unexpected operator, ( Regex parse error in production 'char' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 0, (?i)(varchar([0-9]+)) 2020/01/14 20:10:48 enter group 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter alternation 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOps 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOp 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomic 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter char 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter CHAR 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit CHAR 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter charRange 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit char 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 char Regex parse error in production 'charClass' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '[' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : unexpected operator, ? Regex parse error in production 'char' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 1, (?i)(varchar([0-9]+)) 2020/01/14 20:10:48 enter group 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit group 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 group Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '(' at 1 got '?' of '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomic 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 atomic Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '(' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '[' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : unexpected operator, ? Regex parse error in production 'char' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 1, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char 2020/01/14 20:10:48 exit atomicOp 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomicOps 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit group 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 group Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected ')' at 1 got '?' of '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomic 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 atomic Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected ')' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : expected '[' at 0 got '(' of '(?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unexpected operator, ( Regex parse error in production 'char' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 0, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char 2020/01/14 20:10:48 exit atomicOp 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomicOps 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation 1 '?i)(varchar([0-9]+))' panic: Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected ')' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : expected '[' at 0 got '(' of '(?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unexpected operator, ( Regex parse error in production 'char' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 0, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '(' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '[' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : unexpected operator, ? Regex parse error in production 'char' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 1, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char Regex parse error in production 'Parse' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unconsumed input

exyzzy avatar Jan 15 '20 04:01 exyzzy

Hi @exyzzy,

Lexmachine does not support every variation of regular expression syntax. I will tag this as a feature request to add support for modes in the parser. For reference, I document the portions of the regular expression syntax that lexmachine support here: https://github.com/timtadh/lexmachine#regular-expressions

Note, just because lexmachine doesn't support (?i) doesn't mean you can't achieve case insensitivity. For example, your given expression could be rewritten as

([Vv][Aa][Rr][Cc][Hh][Aa][Rr]\([0-9]+\))

I recognize this is more work.

timtadh avatar Jan 15 '20 16:01 timtadh

The modes the Go regexp parser supports are documented here: https://golang.org/pkg/regexp/syntax/

timtadh avatar Jan 15 '20 16:01 timtadh

Hi Tim, Thanks. Yes, I also came up with the workaround you mention above and it does do the trick, just a little clumsier. So I am not blocked. Thanks for making LexMachine, it's really cool.

exyzzy avatar Jan 15 '20 17:01 exyzzy

Hey Tim! I was trying to use a regex expression like this "^-?[0-9]+$" but I think "$" is not supported, what could I do instead?

rolyagca avatar Jun 05 '20 02:06 rolyagca

@rolyagca correct, $ is not supported (nor is ^) if you would like to see there support please open a separate bug.

timtadh avatar Jun 05 '20 03:06 timtadh