Enhance detection of buffer overflows by introducing variable for length of array/blob
This draft PR enhances the detection of buffer overflows by introducing a variable for the length of array/blob as suggested by @michael-schwarz in #620.
Arrays:
Upon creation of a VLA vdecl creates a new ghost variable in the domain, which tracks the symbolic relationships between program variables and the length of the array. Using the query system, the arraydomain can check for out-of-bounds accesses.
Malloc:
This works similarly to the VLA case. When upon calling malloc/alloc/calloc the special transfer function creates a new ghost variable, which tracks the length of blob size. MemOutOfBounds can then use the query system to ask the relational domain if an OOB access occurred. In addition, I implemented pointer tracking as we would not be able to know the symbolic relationship of the pointer offset otherwise. (see the following example).
int main() {
int EXP_OF_ARRAY_LENGTH = rand();
EXP_OF_ARRAY_LENGTH %= 4;
int * arr;
arr = malloc( sizeof(int)* EXP_OF_ARRAY_LENGTH );
for (int i = 0 ; i < EXP_OF_ARRAY_LENGTH ; i++) {
char * t = arr + i ;
*t = 2;
}
}
CHANGES/BUG fixes:
- [x] The relational domain wasn't able to parse sizeOf expression. This is now fixed. One such case would be the new test case 36 100.
- [x] Rewrite the logic of memOutOfBounds. Previously, the analysis gave no warnings for negative indices and was therefore unsound. One such example is in the new test case 74 33
In addition to that, due to the fact that the struct Offset and pointer Offset are analyzed separately, the analysis was unsound in certain cases. (see the newly added test 74 32 and 74 34)
Finally, the memOutOfBounds analysis should now be more precise and give less false warnings for structs see 74 13
- [x] Fix computation of the no_overflow flag. Previously, the no_overflow flag was computed using a comparison of top. This was incorrect in the case of comparisons. When evaluating comparisons, we get the interval [0,1], if we have an overflow. However, this does not correspond to Top. Additionally, this flag was not used in further computations. This leads to the RD being unable to compare unsigned and signed integers, which is essential for this analysis. To fix this, I added an overflow flag in the intdomain to determine if an overflow occurred in an expression. This is a nonfunctional solution; however, after initially trying to rewrite the logic to incorporate the overflow information, I concluded that this would lead to too much refactoring.
```C
int main() {
unsigned int EXP_OF_ARRAY_LENGTH = (rand()%32) +2;
for ( int i = 0 ;i < EXP_OF_ARRAY_LENGTH -33; i++)
assert( i< EXP_OF_ARRAY_LENGTH); // UNKNOWN
return 0;
}
TODO:
- [x] Option for toggling pointer tracking
- [x] Rewrite memOutOfBounds query logic
- [x] Fix removal of ghost variables after function return
- [ ] Clean-up
Greta to see there's progress here! I left some first comments, but I have not had the time to go into details yet (especially the changes inside the relational domain and the oob analysis).
Here's two questions I already have though:
* One's conceptual: Have you considered what happens for multi-threaded programs? Initially we thought about having an analysis such as this to demonstrate the usefulness of our analyses of global variables. Do you think this it within reach? * The other one is perhaps more down to earth: What do you do when you encounter a `malloc` only in one branch? Will you not have issues with the apron environments not matching up after the join?
-
~~The problem with global variables in Multithreading right now is that we need the relational information of the ghost variable with the initialization expression together in one domain. However, in the case of multithreading, the global variables get removed and are tracked separately. Therefore, we lose the relational relationship between the ghost variable and the initialization expression. Additionally, in the case of escaped variables, the relationship remains in the domain. Nonetheless, this is still useless due to the fact that new variables are introduced for the escaped variables. So, we still know nothing in this case.~~ I got this to work with global invariants after setting the ghost variable to Global type.
-
Yes, this is indeed a problem. However, I am also not quite sure how to solve this problem...
Unfortunately, I don't think solving later issues is within reach. Investigating and fixing bugs in the relational domain and unsoundness issues in memOutOfBounds took quite some time.
Before merging this, one should verify whether the handling of non-unique mallocs is sound here, e.g. for a program segment:
int n = 1 + (rand() % 5);
for(int i = 0; i < n; i++){
int* a = malloc(i * sizeof(int));
if(i == 0){ b = a;}
}
*b = 4; // Should warn, as size of block pointed to by b is zero.