C-2PO: Thesis About a Weakly-Relational Pointer Analysis
This PR introduces a weakly-relational analysis between pointers (ref: 2-pointer Logic by Seidl et al.).
The analysis can infer must-equalities and must-disequalities between terms which are built from pointer variables, with the addition of constants and dereferencing.
Moreover, it computes pairs of variables that are in distinct memory blocks, for example if they were initialized with two different calls to malloc.
An example of a property that the analysis can infer is *(x + 1) = 3 + y and a != 5 + *(y + 1).
Abstract states are represented by a congruence closure of terms over the uninterpreted function *.
Additionally, a list of disequalities and a list of block disequalities between terms is stored.
The disequalities are important during the assignment operation, because when we assign a value to an address on the heap, we need to forget the information we had about all terms thay may alias with this address.
Additionally to the (block) disequalities inferred by this analysis, we exploit the May-Point-To analysis in order to derive additional disequalities between terms.
Using the May-Point-To analysis can be disabled with the option ana.c2po.askbase.
There are two version of the join and two versions of the equal function that are implemented.
They can be chosen via the options ana.c2po.precise_join and ana.c2po.normal_form, respectively.
They are disabled by default.
If precise_join is enabled, the join is calculated using the quantitative finite automaton. Otherwise, only the partition is considered for the join, which is a bit less precise.
If normal_form is enabled, the equality of two congruence closures is decided by calculating a normal form, instead of
comparing the equivalence classes. The normal form is computed lazily and doesn't need to be recomputed when we call equal on the same domain element.
The implementation of enter duplicates the parameter variables of each function in order to infer in combine which terms are related to the initial parameters.
In order to use the May-Point-To analysis also on these duplicated variables, an additional analysis startState is added.
It remembers the May-Point-To information of the duplicated variables at the beginning of the function.
Here is a more detailed description of the files that are added:
The file unionFind.ml contains the code for a quantitative union find and the quantitative finite automata.
They will be necessary in order to construct the congruence closure of terms.
The contained modules are:
-
T: here, the terms (e.g.,*(x + 64)) and propositions (e.g.,*(x + 64) = 192 + y) are defined. There are also function to convert a CIL expression to a term and a term to a CIL expression. The offsets in the terms are expressed in bits, therefore they are equal to the offset of the CIL expression multiplied with the type of the element that the variable points to. Each term stores the information about the CIL expression that was used to create the term. This way, it is easier to reconvert a term to a CIL expression and to get information about the type of a term. Only the terms that are pointers or arrays or structs or 64-bit integers are considered by the analysis. Arrays and structs are interpreted as pointers, e.g.a[3]is interpreted as the term*(a + 3). -
UnionFind: the union-find data structure is defined with terms as elements. -
LookupMap: this map represents the transitions of the quantitative finite automata. Each termtis mapped to the terms*(z + t)or equal terms.
The file congruenceClosure.ml contains the data structures for the C-2PO Domain, i.e., the congruence closure, the disequalities and the block disequalities:
-
BlDisrepresents the block disequalities as a map that maps each term to a set of terms that are definitely in a different block. -
Disequalitiesrepresents the disequalities as a map from a term to a map from an integer to another term. The module contains functions to compute the closure of the disequalities that are implied by equalities, disequalities or block disequalities. -
SSet: a set of terms that are currently considered by the analysis. -
MRMap: maps each equivalence class to a minimal representative. This is necessary for computing the normal form. - There are functions to calculate the closures of the proposition, to insert terms, add propositions, remove terms, two methods to compute join and two methods to compute equal.
-
MayBeEqual: contains code to check if two terms may point to the same address or if they may overlap. It uses information from the disequalities and from MayPointTo for this purpose.
The file c2poDomain.ml defines the domain operations.
c2poAnalysis.ml contains the transfer functions for the analysis.
duplicateVars.ml contains functions for duplicating variables, which is used in enter in the C-2PO analysis and in the StartState analysis.
startStateAnalysis.ml remembers the value of each parameter at the beginning of a function. It answers the query May-Point-To for the duplicated variables and returns the initial value of the original variable.
singleThreadedLifter.ml transforms any analysis into a single threaded analysis by returning top any time the code might be multi-threaded. This can be reused by other analyses in the future.
As almost all files are new here, it should be easy to merge master and add startcontext (c.f. https://github.com/goblint/analyzer/pull/1427), so we can see if the CI passes then.
Any objections to merging @jerhard @sim642?
@jerhard: You promised a review on this until 2024-11-19 in the Gobcon. Do you have an estimate when you will be able to provide a review?
@michael-schwarz and I had a look at the treatment of function calls in the c-2po analysis. We noticed the following:
- C-2PO does not keep equalities between parameters and actuals. E.g., when analyzing a function call
f(&g), the C-2PO analysis offwill not know that the parameter has the value&g. It may make sense to track this information in C-2PO. - Currently, the
combine_envandcombine_assignfunction share a larger section of code. It may be possible to avoid this duplication if one were to keep ghost variables in thecombine_env, thus avoiding reconstruction incombine_assign.
I don't quite see what the problem with the failing regression test is? It seems like the ordering of race warnings somehow has changed in one test case?
It may be due to me enabling warn.deterministic on master for some tests - I merged that commit in now, so let's see!
That seems to indeed have been it!
Somehow every unlocked CI job on master after merging this is failing with
Fatal error: exception Sys_error("run/config.json: No such file or diretory")