Refactor MLIL/HLIL Rust API
Currently the Rust API has incomplete MLIL and HLIL bindings, on top of that it may seem to some (including myself) that they lack "polish" that one would associate with a paid product. We should do our best to remedy that with a refactored MLIL and HLIL API that both allows for easy traversal and reading of a given functions MLIL and HLIL representations, as well as an API suited for complex and trivial IL modifications.
To get there we need to first see what really does not work with the API as it stands, the most pressing issue for us is making it so that we can actually write back IL modifications to the function in a standardized way. Currently we expose two versions of both MLIL and HLIL, a non-lifted and lifted version, if we want to keep two seperate IL representations for both MLIL and HLIL we will need to write a lowering from the lifted to the non-lifted such that we can write back the changes to the core objects.
To fix this I propose we adopt the accessor API of LLIL, while more cumbersome to match on individual operations as we no longer have a fully structured IL operation enum, I believe we can remedy that with more helper functions and macros.
Current LLIL example:
// 0 @ 00025f10 (LLIL_SET_REG.d edi = (LLIL_REG.d edi))
let instr_0 = llil_instr_iter.next().unwrap();
assert_eq!(instr_0.index, LowLevelInstructionIndex(0));
assert_eq!(instr_0.address(), image_base + 0x00025f10);
println!("{:?}", instr_0);
println!("{:?}", instr_0.kind());
match instr_0.kind() {
LowLevelILInstructionKind::SetReg(op) => {
assert_eq!(op.size(), 4);
match op.dest_reg() {
LowLevelILRegisterKind::Arch(reg) => assert_eq!(reg.name(), "edi"),
_ => panic!("Expected Register::ArchReg"),
}
assert_eq!(op.source_expr().index, LowLevelExpressionIndex(0));
}
_ => panic!("Expected SetReg"),
}
Current MLIL example:
// 0 @ 00025f10 (MLIL_SET_VAR.d edi_1 = (MLIL_VAR.d edi))
let instr_0 = mlil_instr_iter.next().unwrap();
assert_eq!(instr_0.instr_index, MediumLevelInstructionIndex(0));
assert_eq!(instr_0.expr_index, MediumLevelExpressionIndex(1));
assert_eq!(instr_0.address, image_base + 0x00025f10);
println!("{:?}", instr_0.kind);
match instr_0.kind {
MediumLevelILInstructionKind::SetVar(op) => {
assert_eq!(op.dest.index, 524288);
assert_eq!(op.src, MediumLevelExpressionIndex(0));
}
_ => panic!("Expected SetVar"),
}
// Lifted version of `instr_0`
let lifted_instr_0 = instr_0.lift();
match lifted_instr_0.kind {
MediumLevelILLiftedInstructionKind::SetVar(op) => {
let dest_var_name = entry_function.variable_name(&op.dest);
assert_eq!(dest_var_name, "edi_1");
match op.src.kind {
MediumLevelILLiftedInstructionKind::Var(var) => {
let src_var_name = entry_function.variable_name(&var.src);
assert_eq!(src_var_name, "edi");
}
_ => panic!("Expected Var"),
}
},
_ => panic!("Expected SetVar"),
}
As you can tell, the lifted version is a lot cleaner to write, op can be de-structured within the match statement as well to require certain variables or values to match. The accessor API without any helper functions or macros requires more code to do the equivalent operations.
However what you do not see in this small snippet is the cost incurred with the lifted variant, there are many allocations happening behind the scenes because once you call instr_0.lift() you need to box up the sub-expressions, even if you do not intend to lift the sub expressions.
The proposal would be as follows:
- Refactor the MLIL API to use accessors like LLIL, with IL builders and appropriate methods for modifying MLIL.
- Provide higher level API's for both LLIL and MLIL so that traversing both IL's is more intuitive.
- Repeat the process for HLIL.
The refactored API should not include a "raw" variant or a "lifted" variant, we should aim to expose a singular comprehensive API like that of the LLIL instruction/expression API.
On the topic of instruction/expression, we cannot delineate between the two like in LLIL, we should just have a single instruction type.
Somewhere in between 1 and 2 also fix the following issues: https://github.com/Vector35/binaryninja-api/issues/7189 https://github.com/Vector35/binaryninja-api/issues/7207 https://github.com/Vector35/binaryninja-api/issues/7206
I want to get feedback from users of the Rust API to determine what is missing from this issue, I am sure there are many pain points not addressed that should be known before someone commits to refactoring the API.
Hello, I wanted to write an implementation for MLIL/HLIL myself, but it's really quite time-consuming, especially since I don't understand the Binary Ninja C API well. So, from my point of view:
I would like to have mutation capabilities similar to LLIL, because it's very easy to use.
Some parts of the IL could be moved into common types, such as InstructionIndex, ILFunction, Mutability, and FunctionForm. I'm not completely sure, but it seems like Operation could also be shared in some way, although I didn't go that deep into it.
Also, for HLIL, I think AST could be used as a special marker for FunctionForm.
It would also be great if we could introduce some traits that work on any IL, such as visit_tree.
As the person responsible for the overuse of rust traits in the LLIL implementation, I wonder if the best move would be to just accept some amount of copy pasting and providing different types outright for the various forms/mutability states. I always felt like the docs for LLIL the way I laid things out were borderline unusable.
I think macros might be able to get some of the common helpers available across forms, or just implementing such a visit_tree trait on the various wrappers.
That said, maintenance of the Rust bindings is now handled by others. I haven't fully thought through this but there are parts of the LLIL binding design I do regret.