Function level disassembly and lifting support
Some instruction sets don't easily lift one instruction at a time. For instance wasm, .Net, Java, I'm sure others. One possible solution to this is to allow lifting of the entire function at once rather than one instruction at a time as the current architecture plugins support.
This is a blocker for Qualcomm Hexagon support. No RE tools (Binja, IDA or Ghidra) support Hexagon natively at this time. IDA is the best out of all three with a plugin from a contributor. Binja and Ghidra also have contributor plugins, but they are much inferior.
You'd definitely take up some market share if you were the first to implement Hexagon!
This is a blocker for Qualcomm Hexagon support.
Would you mind elaborating on why Hexagon needs this? I'm not familiar with the architecture and now I'm curious. 🤔
It is heavily pipelined. You need to be able to lift multiple instructions at once in order to appropriately translate the semantics.
The AnalyzeBasicBlocks portion of this is implemented and on dev now. This means that for architectures that need control over the recursive descent/basic block recovery portion of the analysis (e.g., architectures with function headers) or similar should be possible to implement now. Hexagon should also be implementable with these changes.
Although the lifting changes aren't implemented yet, for architectures that can cope with being invoked once per basic block (especially if the arch plugin already identified the basic blocks) and don't need to peer at data except for that basic block, they can simply consume entire basic blocks in the lifting callback.
Work that remains:
- Maintaining context between basic block recovery and lifting and text rendering
- Allowing text tokens from architectures to contribute newlines
- Expand the lifter to allow it to look at data outside of the current basic block
What can't be implemented until other work is completed:
- If lifting IL for a block requires data from a different block that is still unrepresentable