Enhancement - use antlr to build out parser from grammar / lexar file
to stabilise parser -
I suggest rebuilding some of the code to leverage the antlr grammar / g4 files here https://github.com/psygate/smali-antlr4-grammar
If you download this wget https://www.antlr.org/download/antlr-4.7.2-complete.jar
you can then run
java -cp antlr-4.7.2-complete.jar org.antlr.v4.Tool -Dlanguage=Go -visitor -o gen SmaliLexer.g4
java -cp antlr-4.7.2-complete.jar org.antlr.v4.Tool -Dlanguage=Go -visitor -o gen SmaliParser.g4
this will spit out the following files / code

https://gist.github.com/8secz-johndpope/30868ccd59f211f0000b90e6176dead7
you should then be able to walk through the smali file / maybe reducing the out of bounds crashes people (including myself) have been experiencing.
For illustration - I successfully used the grammar files to build out parsers / lexers for hundreds of languages with swift https://github.com/johndpope/ANTLR-Swift-Target https://github.com/johndpope/Antlr-Swift-runtime
I forget the entry point into class / it changes for each grammar
Here is the code for swift to read a java file you can find in the above repo.
let textFileName = "Test.java"
if let textFilePath = Bundle.main.path(forResource: textFileName, ofType: nil) {
let lexer = Java8Lexer(ANTLRFileStream(textFilePath))
print("lexer:",lexer)
let tokens = CommonTokenStream(lexer)
let parser = try Java8Parser(tokens)
let tree = try parser.compilationUnit()
print("tree:",tree)
let walker = ParseTreeWalker()
let java8walker = Java8Walker()
try walker.walk(java8walker,tree)
} else {
print("error occur: can not open \(textFileName)")
}
The psuedo code would be
let textFilePath = "/path/Test.smali"
let lexer = NewSmaliLexer(ANTLRFileStream(textFilePath)) //this NewSmaliLexer exists
print("lexer:",lexer)
let tokens = CommonTokenStream(lexer) /// ?? there should be a method to do this
let parser = try NewSmaliParser(tokens)
let tree = try parser.compilationUnit() // maybe ToStringTree?
print("tree:",tree)
let walker = ParseTreeWalker() // Here as the lexer / parser reads - you can hook in to translate stuff.
let java8walker = Java8Walker()
try walker.walk(java8walker,tree)
there are other people who have created translation using antlr to do this https://github.com/8secz-johndpope/ObjcGrammar you may need some help - when I have more time I will circle back.
You're right, that approach would be much better, as currently I support only a very limited amount of instructions. Will look into it.
vscode has smali syntax highlighting https://github.com/ViRb3/vscode-smali/tree/master/smali could this help?
if you surface any work in a new feature branch - I'm happy to take a look
@8secz-johndpope Thanks for getting back with this issue :) Took a look at it, but it's actually more confusing, since it's based on regexes. Planning to make another branch for antlr this week, per your suggestions.