kani Investigate failed check in memcmp

This was discovered in #1332, but also occurs in a synchronous context:

#[kani::proof]
pub fn main() {
    let coll: Result<Vec<u32>, ()> = std::iter::empty().collect();
    assert_eq!(Ok(vec![]), coll);
}

Running this with Kani gives the result:

SUMMARY:
 ** 2 of 436 failed
Failed Checks: memcmp region 1 readable
 File: "<builtin-library-memcmp>", line 19, in memcmp
Failed Checks: memcpy region 2 readable
 File: "<builtin-library-memcmp>", line 21, in memcmp

VERIFICATION:- FAILED

Aug 10 '22 22:08 fzaiser

Reduced to

use std::iter::FromIterator;

#[kani::proof]
pub fn main() {
    let coll: Vec<u32> = FromIterator::from_iter(std::iter::empty());
    assert_eq!(Vec::<u32>::new(), coll);
}

Aug 11 '22 15:08 fzaiser

Reduced further:

#[kani::proof]
pub fn main() {
    assert_eq!(Vec::<u32>::new(), Vec::<u32>::new());
}

The reason seems to be that Vec::new sets its pointer to dangling: https://github.com/rust-lang/rust/blob/aeb5067967ef58e4a324b19dd0dba2f385d5959f/library/alloc/src/raw_vec.rs#L123

Furthermore, in the PartialEq implementation, it calls memcmp: https://github.com/rust-lang/rust/blob/aeb5067967ef58e4a324b19dd0dba2f385d5959f/library/core/src/slice/cmp.rs#L91

But I'm pretty sure that memcmp requires valid pointers, even with size 0.

Aug 11 '22 17:08 fzaiser

Nice work reducing the test! We've been bitten by the dangling pointer issue before, which triggered us to turn off some of CBMC's pointer checks (see https://github.com/model-checking/kani/issues/763). Also see this Zulip thread that @celinval created.

The failure does not occur if the vector is created with capacity (to eliminate the dangling pointer):

pub fn main() {
    assert_eq!(Vec::<u32>::with_capacity(1), Vec::<u32>::with_capacity(1));
}

Not sure if there's a way to disable the failing CBMC check. Another possibility is to provide our own override for memcmp.

Aug 11 '22 18:08 zhassan-aws

I've asked on Zulip again, and apparently it is fine to call memcmp on the dangling pointer.

Aug 11 '22 20:08 fzaiser

Specifically, Miri permits calling memcmp with size 0 on a pointer without provenance, such as NonNull::dangling. We are essentially imagining that a zero-sized object exists at every address in the address space (except at address 0). Since zero-sized objects cannot be observed, this is consistent with what the compiler knows and assumes.

However Miri will still flag an error on use-after-free with memcmp, i.e. when the pointer has provenance and the allocation it points to no longer exists.

Aug 11 '22 20:08 RalfJung

@RalfJung I'm not totally sure about this. From the C Standard:

If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.

http://port70.net/~nsz/c/c11/n1570.html#7.1.4

It's not just NULL. I guess it depends if you consider the dangling pointer outside the memory space of the program.

Aug 11 '22 20:08 danielsn

Not sure if that changed in later versions of the standard, but https://en.cppreference.com/w/c/string/byte/memcmp lists the requirements and says

The behavior is undefined if access occurs beyond the end of either object pointed to by lhs and rhs. The behavior is undefined if either lhs or rhs is a null pointer.

Neither of these are the case on a call like memcmp(1, 1, 0).

Aug 11 '22 21:08 RalfJung

If only there was an unambiguous spec that would actually precisely spell out all requirements... in lieu of that, we can also argue that

we are calling memcmp via FFI, so really the rules applying are those of the underlying target architecture
there is no evidence and not even a plausible suggestion for how memcmp(1, 1, 0) being UB could be exploited by compilers
this matches our treatment of size-0 accesses in general, and LLVM devs say this will work with their backend (so even cross-lang LTO is covered)

Aug 11 '22 21:08 RalfJung

While we argue about the true semantics, here's a plan for mitigation discussed with @danielsn: add a hook for memcmp and check if the size is 0. If so, return true, otherwise actually call memcmp.

Aug 11 '22 21:08 fzaiser

A slightly related CBMC issue: https://github.com/diffblue/cbmc/issues/7064.

Aug 11 '22 21:08 zhassan-aws

FWIW, in Miri, memcmp can still be UB when the size is 0. For example:

extern "C" {
    fn memcmp(lhs: *const u8, rhs: *const u8, count: usize);
}

fn main() { unsafe {
    // null ptr is UB
    memcmp(0 as *const u8, 0 as *const u8, 0) ;
} }

and more interestingly:

fn make_uaf() -> *const u8 {
    let b = Box::new(0u8);
    &*b as *const _
}

fn main() { unsafe {
    // use-after-free is UB
    let ptr = make_uaf();
    memcmp(make_uaf(), make_uaf(), 0) ;
} }

The first is probably trivial to detect for Kani, but I do not know if it models provenance accurately enough for the second.

Aug 11 '22 22:08 RalfJung

$ cat uaf.rs 
extern "C" {
    fn memcmp(lhs: *const u8, rhs: *const u8, count: usize);
}

fn make_uaf() -> *const u8 {
    let b = Box::new(0u8);
    &*b as *const _
}

#[kani::proof]
fn main() { unsafe {
    // use-after-free is UB
    let ptr = make_uaf();
    memcmp(make_uaf(), make_uaf(), 0) ;
} }


$ kani uaf.rs
<snip>
Check 3: memcmp.precondition.1
         - Status: FAILURE
         - Description: "memcmp region 1 readable"
         - Location: <builtin-library-memcmp>:19 in function memcmp

Check 4: memcmp.precondition.2
         - Status: FAILURE
         - Description: "memcpy region 2 readable"
         - Location: <builtin-library-memcmp>:21 in function memcmp
<snip>

Aug 11 '22 22:08 zhassan-aws

Sure, but currently it also rejects code that it potentially ought to accept, see this issue. The interesting part will be to accept that but still reject the UAF example.

Aug 11 '22 23:08 RalfJung

It looks like if I declare the dangling pointer as a 0 byte allocated region using the mmio flag to CBMC it works as expected:

// cbmc mmio.c --mmio-regions 16:4,32:4 --pointer-check
// cbmc mmio.c --mmio-regions 16:0,32:0 --pointer-check
#include <assert.h>
#include <string.h>
int main()
{
  int *p=0x10;
  int *q=0x20;

  /* *p=42; */
  /* *q=314; */
  /* assert(*p==42); */
  /* assert(*q==314); */
  assert(memcmp(p,p,0) == 0);
}

Aug 12 '22 21:08 danielsn

@tautschnig may have thoughts on the semantics of C pointers

Aug 12 '22 21:08 danielsn

Similarly

2 Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters.

Aug 15 '22 19:08 danielsn

I would argue that the pointer is completely valid for memory accesses of size 0. So I view that condition as being satisfied.

Pragmatically speaking, IMO it would be a failure of the memcmp specification if a user like Rust's Vec has to add a branch to special-case size 0 here.

It looks like if I declare the dangling pointer as a 0 byte allocated region using the mmio flag to CBMC it works as expected:

Essentially, the Rust stance is that such a 0-byte allocated region exists at every address (except 0). The LLVM devs seem to agree with that interpretation.

Aug 15 '22 19:08 RalfJung

It seems we disagree on what allocation means. I'd expect allocation to either refer to something generated by the allocator (e.g. malloc or equivalent), or to a static. If every location has a 0 sized allocation, what is the provenance of the pointer? How was it allocated?

Question: why not make a shared static as the dangling pointer? That would have an allocation?

Aug 15 '22 19:08 danielsn

It seems we disagree on what allocation means. I'd expect allocation to either refer to something generated by the allocator (e.g. malloc or equivalent), or to a static. If every location has a 0 sized allocation, what is the provenance of the pointer?

There is an 'invalid' provenance that is the only allowed provenance for these 0-sized 'virtual allocations'.

In Miri, we don't actually consider these allocations as existing. Instead, when we check whether a ptr is valid for a given memory access, then if the ptr has 'invalid' provenance (or "no provenance" or whatever you want to call it -- it is the provenance of a pointer created via ptr::invalid), we only accept the ptr if the access size is 0 (and the address is not 0, and it is sufficiently aligned). If the pointer has any other provenance, we do the full usual check, ruling out use-after-free or out-of-bounds accesses.

But this is equivalent to having such actual allocations everywhere, with 'invalid' provenance, and that is a mental model that got blessed by some LLVM devs.

How was it allocated?

If you want, you can consider ptr::invalid as a 'allocation' function. Or maybe NonNull::dangling should be considered the allocation function, not sure. (But that would not really reflect what Miri implements, since there are other ways to create pointers with this provenance.)

Question: why not make a shared static as the dangling pointer? That would have an allocation?

The issue is that NonNull::dangling can be called with types that need arbitrary alignment. Any static allocation we create might be insufficiently aligned.

Aug 15 '22 19:08 RalfJung