Investigate failed check in memcmp
This was discovered in #1332, but also occurs in a synchronous context:
#[kani::proof]
pub fn main() {
let coll: Result<Vec<u32>, ()> = std::iter::empty().collect();
assert_eq!(Ok(vec![]), coll);
}
Running this with Kani gives the result:
SUMMARY:
** 2 of 436 failed
Failed Checks: memcmp region 1 readable
File: "<builtin-library-memcmp>", line 19, in memcmp
Failed Checks: memcpy region 2 readable
File: "<builtin-library-memcmp>", line 21, in memcmp
VERIFICATION:- FAILED
Reduced to
use std::iter::FromIterator;
#[kani::proof]
pub fn main() {
let coll: Vec<u32> = FromIterator::from_iter(std::iter::empty());
assert_eq!(Vec::<u32>::new(), coll);
}
Reduced further:
#[kani::proof]
pub fn main() {
assert_eq!(Vec::<u32>::new(), Vec::<u32>::new());
}
The reason seems to be that Vec::new sets its pointer to dangling: https://github.com/rust-lang/rust/blob/aeb5067967ef58e4a324b19dd0dba2f385d5959f/library/alloc/src/raw_vec.rs#L123
Furthermore, in the PartialEq implementation, it calls memcmp: https://github.com/rust-lang/rust/blob/aeb5067967ef58e4a324b19dd0dba2f385d5959f/library/core/src/slice/cmp.rs#L91
But I'm pretty sure that memcmp requires valid pointers, even with size 0.
Nice work reducing the test! We've been bitten by the dangling pointer issue before, which triggered us to turn off some of CBMC's pointer checks (see https://github.com/model-checking/kani/issues/763). Also see this Zulip thread that @celinval created.
The failure does not occur if the vector is created with capacity (to eliminate the dangling pointer):
pub fn main() {
assert_eq!(Vec::<u32>::with_capacity(1), Vec::<u32>::with_capacity(1));
}
Not sure if there's a way to disable the failing CBMC check. Another possibility is to provide our own override for memcmp.
I've asked on Zulip again, and apparently it is fine to call memcmp on the dangling pointer.
Specifically, Miri permits calling memcmp with size 0 on a pointer without provenance, such as NonNull::dangling. We are essentially imagining that a zero-sized object exists at every address in the address space (except at address 0). Since zero-sized objects cannot be observed, this is consistent with what the compiler knows and assumes.
However Miri will still flag an error on use-after-free with memcmp, i.e. when the pointer has provenance and the allocation it points to no longer exists.
@RalfJung I'm not totally sure about this. From the C Standard:
If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.
http://port70.net/~nsz/c/c11/n1570.html#7.1.4
It's not just NULL. I guess it depends if you consider the dangling pointer outside the memory space of the program.
Not sure if that changed in later versions of the standard, but https://en.cppreference.com/w/c/string/byte/memcmp lists the requirements and says
The behavior is undefined if access occurs beyond the end of either object pointed to by lhs and rhs. The behavior is undefined if either lhs or rhs is a null pointer.
Neither of these are the case on a call like memcmp(1, 1, 0).
If only there was an unambiguous spec that would actually precisely spell out all requirements... in lieu of that, we can also argue that
- we are calling
memcmpvia FFI, so really the rules applying are those of the underlying target architecture - there is no evidence and not even a plausible suggestion for how
memcmp(1, 1, 0)being UB could be exploited by compilers - this matches our treatment of size-0 accesses in general, and LLVM devs say this will work with their backend (so even cross-lang LTO is covered)
While we argue about the true semantics, here's a plan for mitigation discussed with @danielsn: add a hook for memcmp and check if the size is 0. If so, return true, otherwise actually call memcmp.
A slightly related CBMC issue: https://github.com/diffblue/cbmc/issues/7064.
FWIW, in Miri, memcmp can still be UB when the size is 0. For example:
extern "C" {
fn memcmp(lhs: *const u8, rhs: *const u8, count: usize);
}
fn main() { unsafe {
// null ptr is UB
memcmp(0 as *const u8, 0 as *const u8, 0) ;
} }
and more interestingly:
fn make_uaf() -> *const u8 {
let b = Box::new(0u8);
&*b as *const _
}
fn main() { unsafe {
// use-after-free is UB
let ptr = make_uaf();
memcmp(make_uaf(), make_uaf(), 0) ;
} }
The first is probably trivial to detect for Kani, but I do not know if it models provenance accurately enough for the second.
$ cat uaf.rs
extern "C" {
fn memcmp(lhs: *const u8, rhs: *const u8, count: usize);
}
fn make_uaf() -> *const u8 {
let b = Box::new(0u8);
&*b as *const _
}
#[kani::proof]
fn main() { unsafe {
// use-after-free is UB
let ptr = make_uaf();
memcmp(make_uaf(), make_uaf(), 0) ;
} }
$ kani uaf.rs
<snip>
Check 3: memcmp.precondition.1
- Status: FAILURE
- Description: "memcmp region 1 readable"
- Location: <builtin-library-memcmp>:19 in function memcmp
Check 4: memcmp.precondition.2
- Status: FAILURE
- Description: "memcpy region 2 readable"
- Location: <builtin-library-memcmp>:21 in function memcmp
<snip>
Sure, but currently it also rejects code that it potentially ought to accept, see this issue. The interesting part will be to accept that but still reject the UAF example.
It looks like if I declare the dangling pointer as a 0 byte allocated region using the mmio flag to CBMC it works as expected:
// cbmc mmio.c --mmio-regions 16:4,32:4 --pointer-check
// cbmc mmio.c --mmio-regions 16:0,32:0 --pointer-check
#include <assert.h>
#include <string.h>
int main()
{
int *p=0x10;
int *q=0x20;
/* *p=42; */
/* *q=314; */
/* assert(*p==42); */
/* assert(*q==314); */
assert(memcmp(p,p,0) == 0);
}
@tautschnig may have thoughts on the semantics of C pointers
Similarly
2 Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters.
I would argue that the pointer is completely valid for memory accesses of size 0. So I view that condition as being satisfied.
Pragmatically speaking, IMO it would be a failure of the memcmp specification if a user like Rust's Vec has to add a branch to special-case size 0 here.
It looks like if I declare the dangling pointer as a 0 byte allocated region using the mmio flag to CBMC it works as expected:
Essentially, the Rust stance is that such a 0-byte allocated region exists at every address (except 0). The LLVM devs seem to agree with that interpretation.
It seems we disagree on what allocation means. I'd expect allocation to either refer to something generated by the allocator (e.g. malloc or equivalent), or to a static. If every location has a 0 sized allocation, what is the provenance of the pointer? How was it allocated?
Question: why not make a shared static as the dangling pointer? That would have an allocation?
It seems we disagree on what allocation means. I'd expect allocation to either refer to something generated by the allocator (e.g. malloc or equivalent), or to a static. If every location has a 0 sized allocation, what is the provenance of the pointer?
There is an 'invalid' provenance that is the only allowed provenance for these 0-sized 'virtual allocations'.
In Miri, we don't actually consider these allocations as existing. Instead, when we check whether a ptr is valid for a given memory access, then if the ptr has 'invalid' provenance (or "no provenance" or whatever you want to call it -- it is the provenance of a pointer created via ptr::invalid), we only accept the ptr if the access size is 0 (and the address is not 0, and it is sufficiently aligned). If the pointer has any other provenance, we do the full usual check, ruling out use-after-free or out-of-bounds accesses.
But this is equivalent to having such actual allocations everywhere, with 'invalid' provenance, and that is a mental model that got blessed by some LLVM devs.
How was it allocated?
If you want, you can consider ptr::invalid as a 'allocation' function. Or maybe NonNull::dangling should be considered the allocation function, not sure. (But that would not really reflect what Miri implements, since there are other ways to create pointers with this provenance.)
Question: why not make a shared static as the dangling pointer? That would have an allocation?
The issue is that NonNull::dangling can be called with types that need arbitrary alignment. Any static allocation we create might be insufficiently aligned.