dump program in LLVM IR
Motivation
Dumping project in the LLVM IR will open an opportunity to many interesting projects, e.g., JIT compilation, running LLVM analyses, creating binaries, lifter verification, etc.
This can be a nice toy project, for someone who would like to learn BAP. And it is the best way to learn both LLVM and BAP intermediate representations.
Implementation
Since BAP IR is quite close to the LLVM IR, the direct transformation should be easy. A proper place, to inject it, would be to write a pretty printer for the project data structure. Here comes the skeleton setup.
Initial setup
- create a folder
bir_to_llvm. - create file
bir_to_llvm.mlwith the following initial contents:
open Core_kernel.Std
open Bap.Std
open Regular.Std
open Format
let pp_nop ppf t =
fprintf ppf "%%r%a = add i1 0, 0@\n" Tid.pp (Term.tid t)
let pp_ret ppf = fprintf ppf "ret void@\n"
let pp_phi = pp_nop
let pp_def = pp_nop
let pp_jmp = pp_nop
let pp_elts ppf elts =
Seq.iter elts ~f:(function
| `Phi phi -> fprintf ppf "%a" pp_phi phi
| `Def def -> fprintf ppf "%a" pp_def def
| `Jmp jmp -> fprintf ppf "%a" pp_jmp jmp)
let pp_args ppf sub = ()
let pp_body ppf blks =
Seq.iter blks ~f:(fun blk ->
fprintf ppf "\n@[bb_%a:@\n%a@\n%t@]@\n"
Tid.pp (Term.tid blk)
pp_elts (Blk.elts blk)
pp_ret)
let pp_ret ppf sub =
fprintf ppf "void"
let pp_sub ppf sub =
let args = Term.enum arg_t sub in
let blks = Term.enum blk_t sub in
fprintf ppf "@[<2>define %a @%s(%a) {@\n%a@]@\n}"
pp_ret args (Sub.name sub) pp_args args pp_body blks
let pp_prog ppf prog =
Term.enum sub_t prog |>
Seq.iter ~f:(fprintf ppf "@[%a@]@\n" pp_sub)
let pp ppf proj =
fprintf ppf "@[%a@]" pp_prog (Project.program proj)
let () =
let writer = Data.Write.create ~pp () in
Project.add_writer ~desc:"print program in LLVM IR"
~ver:"0.1" "llvm" writer
Building and running
- build with
bapbuild bir_to_llvm.plugin - install with
bapbundle install bir_to_llvm.plugin - run with
bap /bin/true -dllvm
or as a one liner:
bapbuild bir_to_llvm.plugin && bapbundle install bir_to_llvm.plugin && bap /bin/true -dllvm
Testing
The generated code should be acceptable llc:
bap /bin/true -dllvm > true.ll
llc true.ll
The command will spill out true.s file with an assembly representation.
Alternative implementation
It would be even nicer to use Term.visitor to implement the printer, however, it relies on the object system and may raise the bar.
Is BAP 0.8 available for download? bap.ece.cmu.edu doesn't seem to have hosted it. I believe it had an LLVM code generator too. Perhaps someone might find that useful until this issue is resolved - for use or even for writing the LLVM IR translator.
There are quite a few forks of the legacy BAP available around the Hub. You can try to use GitHub's search ability to find them all. The first that comes to my mind is https://github.com/0day1day/bap
BAP 0.8 may be available someplace. I would warn that LLVM IR translator for binary has been tried by many, and often does not get you what you're looking for. Imagine the LLVM IR with 1 function that is 1 MB using only goto's. The LLVM IR isn't designed for that. You can do per-function, but you still end up with lots of design choices, e.g., representing the stack (and shared stack frame).
Just my opinion, so take it for what it's worth, LLVM IR is the wrong thing for binary analysis. It's great for a compiler, but the right data structures for binary analysis (although the result of compilation) is different than for compilation itself.
The current BAP is what we think is the best approach.
ivg set pipeline to Icebox
Great work! I have a question for the BAP IR. Is it a "high-level" IR or "low-level"? Here, I refer the "high-level" to the original IRs without optimizations, such as no O1~O3. The "low-level" IR is like a direct translator from assembly code to IR.
It is low level, as it expands instructions up to the CPU microcode, so it's lower than assembly or machine code.
hello i had installed bap and want to use zhe bap to transform the executable procedure to LLVM IR, do i need to follow this ,thanks you very much
This issue is basically saying that dumping BIR into IR is not implemented and suggests anyone, who would like to implement it, a course of actions. Note, that it is not trivial, so do not expect an easy trip. A few of us went down this road with no success :)
thank you for your answer i dont't understand the BAP well ,and i want to tansform the binary procedure into LLVM IR( i had read a paper where it use the BAP tool). i have read BAP command ,but could't figure out.
It is not possible in modern BAP, that's why this issue is open.
ok ,thanks
Curiously, with the modern move of BAP to the KB and CT, implementing something like this might be easier (might be not, depending on some conversion peculiarities).
Excuse me, this issue is still open. Does it mean that dumping BIR into IR is not implemented yet? None went down this road with succesc =.=