Add support for modes such as `(?i)`
Trying in lexer.go to parse:
lexer.Add([]byte((?i)(varchar\([0-9]+\))), token("VARCHARID"))
results: (debug=true) ... 01/14 20:10:48 enter alternation 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOps 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOp 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomic 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter char 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter CHAR 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit CHAR 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter charRange 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit char 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 char Regex parse error in production 'charClass' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : expected '[' at 0 got '(' of '(?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unexpected operator, ( Regex parse error in production 'char' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 0, (?i)(varchar([0-9]+)) 2020/01/14 20:10:48 enter group 0 '(?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter alternation 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOps 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomicOp 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter atomic 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter char 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter CHAR 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit CHAR 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter charRange 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit char 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 char Regex parse error in production 'charClass' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '[' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : unexpected operator, ? Regex parse error in production 'char' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 1, (?i)(varchar([0-9]+)) 2020/01/14 20:10:48 enter group 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit group 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 group Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '(' at 1 got '?' of '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomic 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 atomic Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '(' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '[' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : unexpected operator, ? Regex parse error in production 'char' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 1, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char 2020/01/14 20:10:48 exit atomicOp 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomicOps 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit group 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 group Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected ')' at 1 got '?' of '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomic 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 atomic Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected ')' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : expected '[' at 0 got '(' of '(?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unexpected operator, ( Regex parse error in production 'char' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 0, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char 2020/01/14 20:10:48 exit atomicOp 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit atomicOps 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 enter alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation_ 1 '?i)(varchar([0-9]+))' 2020/01/14 20:10:48 exit alternation 1 '?i)(varchar([0-9]+))' panic: Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected ')' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : expected '[' at 0 got '(' of '(?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unexpected operator, ( Regex parse error in production 'char' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 0, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char Regex parse error in production 'group' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '(' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'charClass' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : expected '[' at 1 got '?' of '?i)(varchar([0-9]+))' Regex parse error in production 'CHAR' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : unexpected operator, ? Regex parse error in production 'char' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected a CHAR or charRange at 1, (?i)(varchar([0-9]+)) Regex parse error in production 'atomic' : at index 1 line 0 column 2 '?i)(varchar([0-9]+))' : Expected group or char Regex parse error in production 'Parse' : at index 0 line 0 column 1 '(?i)(varchar([0-9]+))' : unconsumed input
Hi @exyzzy,
Lexmachine does not support every variation of regular expression syntax. I will tag this as a feature request to add support for modes in the parser. For reference, I document the portions of the regular expression syntax that lexmachine support here: https://github.com/timtadh/lexmachine#regular-expressions
Note, just because lexmachine doesn't support (?i) doesn't mean you can't achieve case insensitivity. For example, your given expression could be rewritten as
([Vv][Aa][Rr][Cc][Hh][Aa][Rr]\([0-9]+\))
I recognize this is more work.
The modes the Go regexp parser supports are documented here: https://golang.org/pkg/regexp/syntax/
Hi Tim, Thanks. Yes, I also came up with the workaround you mention above and it does do the trick, just a little clumsier. So I am not blocked. Thanks for making LexMachine, it's really cool.
Hey Tim! I was trying to use a regex expression like this "^-?[0-9]+$" but I think "$" is not supported, what could I do instead?
@rolyagca correct, $ is not supported (nor is ^) if you would like to see there support please open a separate bug.