bap icon indicating copy to clipboard operation
bap copied to clipboard

dump program in LLVM IR

Open ivg opened this issue 9 years ago • 13 comments

Motivation

Dumping project in the LLVM IR will open an opportunity to many interesting projects, e.g., JIT compilation, running LLVM analyses, creating binaries, lifter verification, etc.

This can be a nice toy project, for someone who would like to learn BAP. And it is the best way to learn both LLVM and BAP intermediate representations.

Implementation

Since BAP IR is quite close to the LLVM IR, the direct transformation should be easy. A proper place, to inject it, would be to write a pretty printer for the project data structure. Here comes the skeleton setup.

Initial setup

  1. create a folder bir_to_llvm.
  2. create file bir_to_llvm.ml with the following initial contents:
open Core_kernel.Std
open Bap.Std
open Regular.Std
open Format

let pp_nop ppf t =
  fprintf ppf "%%r%a = add i1 0, 0@\n" Tid.pp (Term.tid t)

let pp_ret ppf = fprintf ppf "ret void@\n"

let pp_phi = pp_nop
let pp_def = pp_nop
let pp_jmp = pp_nop

let pp_elts ppf elts =
  Seq.iter elts ~f:(function
      | `Phi phi -> fprintf ppf "%a" pp_phi phi
      | `Def def -> fprintf ppf "%a" pp_def def
      | `Jmp jmp -> fprintf ppf "%a" pp_jmp jmp)

let pp_args ppf sub = ()

let pp_body ppf blks =
  Seq.iter blks ~f:(fun blk ->
      fprintf ppf "\n@[bb_%a:@\n%a@\n%t@]@\n"
        Tid.pp (Term.tid blk)
        pp_elts (Blk.elts blk)
        pp_ret)


let pp_ret ppf sub =
  fprintf ppf "void"

let pp_sub ppf sub =
  let args = Term.enum arg_t sub in
  let blks = Term.enum blk_t sub in
  fprintf ppf "@[<2>define %a @%s(%a) {@\n%a@]@\n}"
    pp_ret args (Sub.name sub) pp_args args pp_body blks

let pp_prog ppf prog =
  Term.enum sub_t prog |>
  Seq.iter ~f:(fprintf ppf "@[%a@]@\n" pp_sub)

let pp ppf proj =
  fprintf ppf "@[%a@]" pp_prog (Project.program proj)

let () =
  let writer = Data.Write.create ~pp () in
  Project.add_writer ~desc:"print program in LLVM IR"
    ~ver:"0.1" "llvm" writer

Building and running

  1. build with bapbuild bir_to_llvm.plugin
  2. install with bapbundle install bir_to_llvm.plugin
  3. run with bap /bin/true -dllvm

or as a one liner:

bapbuild bir_to_llvm.plugin && bapbundle install bir_to_llvm.plugin && bap /bin/true -dllvm

Testing

The generated code should be acceptable llc:

bap /bin/true -dllvm > true.ll
llc true.ll

The command will spill out true.s file with an assembly representation.

Alternative implementation

It would be even nicer to use Term.visitor to implement the printer, however, it relies on the object system and may raise the bar.

ivg avatar Oct 10 '16 13:10 ivg

Is BAP 0.8 available for download? bap.ece.cmu.edu doesn't seem to have hosted it. I believe it had an LLVM code generator too. Perhaps someone might find that useful until this issue is resolved - for use or even for writing the LLVM IR translator.

dnivra avatar Feb 26 '17 06:02 dnivra

There are quite a few forks of the legacy BAP available around the Hub. You can try to use GitHub's search ability to find them all. The first that comes to my mind is https://github.com/0day1day/bap

ivg avatar Feb 26 '17 12:02 ivg

BAP 0.8 may be available someplace. I would warn that LLVM IR translator for binary has been tried by many, and often does not get you what you're looking for. Imagine the LLVM IR with 1 function that is 1 MB using only goto's. The LLVM IR isn't designed for that. You can do per-function, but you still end up with lots of design choices, e.g., representing the stack (and shared stack frame).

Just my opinion, so take it for what it's worth, LLVM IR is the wrong thing for binary analysis. It's great for a compiler, but the right data structures for binary analysis (although the result of compilation) is different than for compilation itself.

The current BAP is what we think is the best approach.

dbrumley avatar Feb 26 '17 14:02 dbrumley

ivg set pipeline to Icebox

issue-sh[bot] avatar Nov 09 '17 13:11 issue-sh[bot]

Great work! I have a question for the BAP IR. Is it a "high-level" IR or "low-level"? Here, I refer the "high-level" to the original IRs without optimizations, such as no O1~O3. The "low-level" IR is like a direct translator from assembly code to IR.

yuedeji avatar Jan 30 '18 21:01 yuedeji

It is low level, as it expands instructions up to the CPU microcode, so it's lower than assembly or machine code.

ivg avatar Jan 30 '18 22:01 ivg

hello i had installed bap and want to use zhe bap to transform the executable procedure to LLVM IR, do i need to follow this ,thanks you very much

yueyuep avatar Mar 26 '19 13:03 yueyuep

This issue is basically saying that dumping BIR into IR is not implemented and suggests anyone, who would like to implement it, a course of actions. Note, that it is not trivial, so do not expect an easy trip. A few of us went down this road with no success :)

ivg avatar Mar 26 '19 15:03 ivg

thank you for your answer i dont't understand the BAP well ,and i want to tansform the binary procedure into LLVM IR( i had read a paper where it use the BAP tool). i have read BAP command ,but could't figure out.

yueyuep avatar Mar 26 '19 16:03 yueyuep

It is not possible in modern BAP, that's why this issue is open.

ivg avatar Mar 26 '19 16:03 ivg

ok ,thanks

yueyuep avatar Mar 26 '19 16:03 yueyuep

Curiously, with the modern move of BAP to the KB and CT, implementing something like this might be easier (might be not, depending on some conversion peculiarities).

XVilka avatar Apr 01 '20 05:04 XVilka

Excuse me, this issue is still open. Does it mean that dumping BIR into IR is not implemented yet? None went down this road with succesc =.=

zyt755 avatar Jun 27 '21 16:06 zyt755