code-morphing
code-morphing copied to clipboard
Code Morphing pass for LLVM
- CMP -- Code Morphing Pass
This pass does static code morphing on the IR of the llvm compiler. It has been developed as a project for an exam on cryptography.
It's licensed under the GPL v3 as stated in every source file in the project. For any further information on the license please refer to http://www.gnu.org/licenses/gpl.html
- Project organization
The whole project is split in three part:
- An LLVM pass that does the instruction morphing.
- A code generator that starting from a language that we have defined builds the alternatives table used inside the pass.
- A testing project that I used to test if the transformation pass works on a real algorithm (AES from polar)
All the three project are hosted on github you can find them at:
- LLVM Pass :: well, I suppose you already know where it is ;)
- Code Generator :: [[https://github.com/mminutoli/code-morphing-generator][Code Morphing Generator]]
- Testing Project :: [[https://github.com/mminutoli/code-morphing-tests][Code Morphing Test]]
In the following section you should find all the details necessary to build everything and be up and running. If anything... just ask.
- Quick Start
It is strongly encouraged to keep source tree separated from build tree, so we assume the following directory structure:
- Let
LLVM_WORKbe the working directory - Let
LLVM_SRCthe LLVM sources directory - Let
LLVM_BUILDthe LLVM objects directory - Let
LLVM_ROOTthe LLVM root installation directory
You can use whichever working directory you want. The other directory are
assumed to be children of LLVM_WORK. From now on:
- Let
LLVM_SRCbe$LLMV_WORK/src - Let
LLVM_BUILDbe$LLVM_WORK/build - Let
LLVM_ROOTbe$LLVM_ROOT/root
First we create the working directory:
#+BEGIN_SRC: shell $ mkdir $LLVM_WORK #+END_SRC
We need LLVM and CLANG. We can get them from Git repositories:
#+BEGIN_SRC: shell $ cd $LLVM_WORK $ git clone http://llvm.org/git/llvm.git $LLVM_SRC $ cd $LLVM_SRC $ git clone http://llvm.org/git/clang.git tools/clang $ cd tools/clang #+END_SRC
We will build in LLVM_BUILD:
#+BEGIN_SRC: shell $ mkdir $LLVM_BUILD $ cd $LLVM_BUILD $ $LLVM_SRC/configure --prefix=$LLVM_ROOT --enable-optimized $ make -j $N $ make install #+END_SRC
In order to start working with CMP you have to clone its master repository on
GitHub into LLVM projects directory:
#+BEGIN_SRC: shell $ cd $LLVM_SRC/projects $ git clone https://github.com/mminutoli/code-morphing.git #+END_SRC
You need to build the configure script:
#+BEGIN_SRC: shell $ cd code-morphing/autoconf $ ./AutoRegen.sh #+END_SRC
Then you have to generate the alternatives builders. To do that check out [[https://github.com/mminutoli/code-morphing-generator][Code Morphing Generator]] project and build it. After that write your instruction alternatives source file.
#+BEGIN_SRC: shell
$ cd $LLVM_SRC/projects
$ cd code-morphing/includes/generated
$ ./
Finally you have to build it:
#+BEGIN_SRC: shell $ cd $LLVM_BUILD/projects $ mkdir code-morphing $ cd code-morphing $ $LLVM_SRC/projects/code-morphing/configure --prefix=$LLVM_ROOT $ make -j $N #+END_SRC
After building, tests are run using:
#+BEGIN_SRC: shell $ make check #+END_SRC
- Transformation details ** The problem Embedded processor are subject to cross-side channel attacks due to the simplicity of their Instruction Set (IS) and the lack of hardware counter measure to this issue.
A possible solution would be to rewrite at Run Time (RT) some instruction with pattern of instruction semantically equivalent. Even though this approach works well in other environments, it is not applicable in situation were the code is executed from a static ram.
The two main problems that discourage this approach are:
- long latency to switch the SRAM in a writable state
- SRAM can be written a limited amount of times.
** Proposed solution To overcame the limitation explained above this pass works on the IR of LLVM to generate a final binary that has a randomized Control Flow (CF) without the need of memory rewriting at RT.
The steps taken by the transformation are:
- Add a the alternatives vector inside the module
- Declare a randomize function
- For every function that need it add a choice vector and an initialization loop
- For every basic block that need it add the alternative flows
Lets explain the components of the final code in more details:
- Alternative vector :: This vector contains, for every instruction that has alternatives declared, the number of available alternatives.
- Randomize function :: This is a function that generate an integer random number between 0 and its argument included.
- Choice vector :: This vector contains, for every instruction that has alternatives, the selected alternative. At the start of the function the loop get initialized using the Randomize function.
In this way the code produced by the Pass contains all the alternatives available for the instructions and the randomization of the CF is done using a random function and without the need to rewrite memory. Clearly there is no free lunch. The improved security comes at the cost of increasing the binary size.
- Adding instruction alternatives
The steps required to add alternatives to an instruction starting from scratch are:
- Add the instruction to the InstructionTy enumeration
- Declare the alternative numbers
- Handle the instruction inside getInstTy
- Implement the build function
- Add tests for the new alternative
All these steps are done automatilly from the [[https://github.com/mminutoli/code-morphing-generator][Code Morphing Generator]].
** Add the instruction to the enumeration
Open the file include/cmp/InstructionAlternativeUtils.h and search for the enumeration InstructionTy. Add the name of the instruction you want to handle.
Important : The name used for the instruction must be the same of the one used in llvm for the instruction in the Instruction enumeration.
** Declare the alternative numbers
Always in the same file, but some line later, there is a table that declare the alternatives available for all the instruction in the InstructionTy.
Add a line of the kind: #+BEGIN_SRC: cpp CMP_SET_ALTERNATIVE_NUMBER(INST, NUM); #+END_SRC
Where INST is the name used in InstructionTy and NUM is an integer value corresponding to the alternatives you want to provide. (start counting from 1 :) )
** Handle the instruction inside getInstTy
The implementation need a function able to do the reverse mapping from the llvm instruction to the enumeration InstructionTy.
This mapping is given by the getInstTy function.
Open the lib/CodeMorphing/InstructionAlternativeUtils.cpp and add to the body of the function a CHECK_INST invocation for your instruction.
Looking at the CHECK_INST macro you will understand why the name in the enumeration InstructionTy must be the same used in llvm.
** Implement the build function
Open the lib/CodeMorphing/InstructionAlternatives.cpp and implement a specialization for the function template buildAlternatives for your instruction.
The function must return a vector containing all the alternative Basic Blocks. The terminator instruction will be put automatically by the transformation pass, so don't put them.
** Implement test
Last but not least implement test in the usual way with llvm.
- Testing with real algorithms
If you want to test this pass with a real algorithm please take a look at [[https://github.com/mminutoli/code-morphing-tests][Code Morphing Test]].