lua-parser
lua-parser copied to clipboard
Lua parser and abstract syntax tree in Lua
Lua Parser in Lua
Parses to an abstract syntax tree representation. Call tostring() on the AST to get equivalent Lua code.
Works for versions 5.1 5.2 5.3 5.4 and maybe some luajit versions depending on their compatability.
AST also contains some functions like flatten() for use with optimizing / auto-inlining Lua.
See the tests folder for example usage.
Reference
Parser = require 'parser'
This will return the parser class.
Parser.parse(data[, version, source])
This parses the code in data and returns an ast._block object.
This is shorthand for Parser(data, version, source).tree
version is a string '5.1', '5.2', '5.3', etc., corresponding to your Lua version.
The Parser object has a few more functions to it corresponding with internal use while parsing.
source is a description of the source, i.e. filename, which is included in some nodes (functions) for information on where they are declared.
ast = require 'parser.lua.ast'
This is the AST (abstract syntax tree) library,
it hold a collection of AST classes, each representing a different token in the Lua syntax.
n = ast.node()
= This is the superclass of all AST classes.
Each has the following properties:
n.type = returns the type of the node, coinciding with the classname in the ast library with underscore removed.
n.span = source code span information (from and to subtables each with source, line and col fields)
n:copy() = returns a copy of the node.
n:flatten(func, varmap) = flattens / inlines the contents of all function call of this function. Used for performance optimizations.
n:toLua() = generate Lua code. same as the node's __tostring.
n:serialize(apply) = apply a to-string serialization function to the AST.
ast.node subclasses:
n = ast._block(...) = a block of code in Lua.
... is a list of initial child stmt nodes to populate the block node with.
n.type == 'block'.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._stmt() = a statement-node parent-class.
n = ast._assign(vars, exprs) =
An assignment operation.
Subclass of _stmt.
n.type == 'assign'.
Represents the assignment of n.vars to n.exprs.
n = ast._do(...) =
A do ... end block.
Subclass of _stmt.
n.type == 'do'.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._while(cond, ...) =
A while cond do ... end block.
Subclass of _stmt.
n.type == 'while'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._repeat(cond, ...) =
A repeat ... until cond block.
Subclass of _stmt.
n.type == 'repeat'.
n.cond holds the condition expression.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._if(cond, ...) =
A if cond then ... elseif ... else ... end block.
Subclass of _stmt.
n.type == 'if'.
n.cond holds the condtion expression of the first if statement.
All subsequent arguments must be ast._elseif objects, optionally with a final ast._else object.
n.elseifs holds the ast._elseif objects.
n.elsestmt optionally holds the final ast._else.
n = ast._elseif(cond, ...) =
A elseif cond then ... block.
Subclass of _stmt.
n.type == 'elseif'.
n.cond holds the condition expression of the else statement.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._else(...) =
A else ... block.
n.type == 'else'.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._foreq(var, min, max, step, ...) =
A for var=min,max[,step] do ... end block.
Subclass of _stmt.
n.type == 'foreq'.
n.var = the variable node.
n.min = the min expression.
n.max = the max expression.
n.step = the optional step expression.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._forin(vars, iterexprs, ...)
A for var1,...varN in expr1,...exprN do ... end block.
Subclass of _stmt.
n.type == 'forin'.
n.vars = table of variables of the for-in loop.
n.iterexprs = table of iterator expressions of the for-in loop.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._function(name, args, ...)
A function [name](arg1, ...argN) ... end block.
Subclass of _stmt.
n.type == 'function'.
n.name = the function name. This is optional. Omit name for this to represent lambda function. (Which technically becomes an expression and not a statement...)
n.args = table of arguments. This does get modified: each argument gets assigned an .param = true, and an .index = for which index it is in the argument list.
n[1] ... n[#n] = nodes of statements within the block.
n = ast._arg(index)
An argument to a function.
n.type == 'arg'.
n.index = which index in the function's argument list this is.
n = ast._local(exprs)
A local ... statement.
Subclass of _stmt.
n.type == 'local'
n.exprs = list of expressions to be declared as locals.
Expects its member-expressions to be either functions or assigns.
n = ast._return(...)
A return ... statement.
Subclass of _stmt.
n.type == 'return'
n.exprs = list of expressions to return.
n = ast._break(...)
A break statement.
Subclass of _stmt.
n.type == 'break'
n = ast._call(func, ...)
A func(...) function-call expression.
n.type == 'call'
n.func = expression of the function to call.
n.args = list argument expressions to pass into the function-call.
n = ast._nil()
A nil literal expression.
n.type == 'nil'.
n.const == true.
n = ast._boolean()
The parent class of the true/false AST nodes.
n = ast._true()
A true boolean literal expression
n.type == 'true'.
n.const == true.
n.value == true.
ast._boolean:isa(n) evaluates to true
n = ast._false()
A false boolean literal expression
n.type == 'true'.
n.const == true.
n.value == false.
ast._boolean:isa(n) evaluates to true
n = ast._number(value)
A numeric literal expression.
n.type == 'number'.
n.value = the numerical value.
n = ast._string(value)
A string literal expression.
n.type == 'string'.
n.value = the string value.
n = ast._vararg()
A vararg ... expression.
n.type == 'vararg'.
For use within function arguments, assignment expressions, function calls, etc.
n = ast._table(...)
A table { ... } expression.
n.type == 'table'.
n[1] ... n[#n] = expressions of the table.
If the expression in n[i] is an ast._assign then an entry is added into the table as key = value. If it is not an ast._assign then it is inserted as a sequenced entry.
n = ast._var(name)
A variable reference expression.
n.type == 'var'
n.name = the variable name.
n = ast._par(expr)
A ( ... ) parenthesis expression.
n.type == 'par'.
n.expr = the expression within the parenthesis.
n = ast._index(expr, key)
An expr[key] expression, i.e. an __index-metatable operation.
n.type == 'index'.
n.expr = the expression to be indexed.
n.key = the expression of the index key.
n = ast._indexself(expr, key)
An 'expr:keyexpression, to be used as the expression of aast._ callnode for member-function-calls. These are Lua's shorthand insertion ofselfas the first argument.<br>n.type == 'indexself'.<br> n.expr =the expression to be indexed.<br>n.key =the key to index. Must only be a Lua string, (not anast._ string`, but a real Lua string).
Binary operations:
| node type | Lua operator | |
|---|---|---|
| add | + |
|
| sub | - |
|
| mul | * |
|
| div | / |
|
| mod | % |
|
| concat | .. |
|
| lt | < |
|
| le | <= |
|
| gt | > |
|
| ge | >= |
|
| eq | == |
|
| ne | ~= |
|
| and | and |
|
| or | or |
|
| idiv | // |
5.3+ |
| band | & |
5.3+ |
| bxor | ~ |
5.3+ |
| bor | | |
5.3+ |
| shl | << |
5.3+ |
| shr | >> |
5.3+ |
n[1] ... n[#n] = a table of the arguments of the operation.
Unary operations:
| node type | Lua operator | |
|---|---|---|
| unm | - |
|
| not | not |
|
| len | # |
|
| bnot | ~ |
5.3+ |
n[1] = the single argument of the operation.
more extra functions:
Some more useful functions in AST:
ast.copy(node)= equivalent ofnode:copy()ast.flatten(node, func, varmap)= equivalent ofnode:flatten(func, varmap)ast.refreshparentsast.traverseast.nodeclass(type, parent, args)ast.tostringmethod= this specifies the serialization method. It is used to look up the serializer stored inast.tostringmethods
TODO:
- Option for parsing LuaJIT -LL, -ULL, -i number suffixes.
- Speaking of LuaJIT, it has different edge case syntax for 2.0.5, 2.1.0, and whether 5.2-compat is enabled or not. It isn't passing the
minify_tests.lua.
Dependencies:
- https://github.com/thenumbernine/lua-ext
- https://github.com/thenumbernine/lua-template
While I was at it, I added a require() replacement for parsing Lua scripts and registering callbacks,
so any other script can say "require 'parser.load_xform':insert(function(tree) ... modify the parse tree ... end)"
and voila, Lua preprocessor in Lua!
minify_tests.txt taken from the tests at https://github.com/stravant/LuaMinify
I tested this by parsing itself, then using the parsed & reconstructed version to parse itself, then using the parsed & reconstructed version to parse the parsed & reconstructed version, then using the 2x parsed & reconstructed version to parse itself