rust-bindgen bitfields raw pointer accessors

When creating bindings for a struct with bitfields the accessor functions that are created take an immutable receiver (&self). This is a problem if the underlying type is used by both Rust and C, since Rust is not allowed to create &self references, since the C side might modify the value at any time. One solution could be to create raw accessor functions that allow access via *const Self.

Input C Header

struct foo {
    unsigned available:1;
};

Bindgen Invocation

$ bindgen input.h

impl foo {
    #[inline]
    pub fn available(&self) -> ::std::os::raw::c_uint {
        unsafe { ::std::mem::transmute(self._bitfield_1.get(0usize, 1u8) as u32) }
    }
    #[inline]
    pub fn set_available(&mut self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            self._bitfield_1.set(0usize, 1u8, val as u64)
        }
    }
    #[inline]
    pub fn new_bitfield_1(
        available: ::std::os::raw::c_uint,
    ) -> __BindgenBitfieldUnit<[u8; 1usize]> {
        let mut __bindgen_bitfield_unit: __BindgenBitfieldUnit<[u8; 1usize]> = Default::default();
        __bindgen_bitfield_unit.set(0usize, 1u8, {
            let available: u32 = unsafe { ::std::mem::transmute(available) };
            available as u64
        });
        __bindgen_bitfield_unit
    }
}

Expected Results

Additional functions that should be generated:

impl foo {
    #[inline]
    pub unsafe fn raw_available(this: *const Self) -> ::std::os::raw::c_uint {
        unsafe {
            ::std::mem::transmute(__BindgenBitfieldUnit::raw_get(
                addr_of!((*this)._bitfield_1),
                0usize,
                1u8,
            ) as u32)
        }
    }

    #[inline]
    pub unsafe fn raw_set_available(this: *mut Self, val: ::std::os::raw::c_uint) {
        unsafe {
            let val: u32 = ::std::mem::transmute(val);
            __BindgenBitfieldUnit::raw_set(addr_of!((*this)._bitfield_1), 0usize, 1u8, val as u64)
        }
    }
}

This also requires changes to the generated __BingenBitfieldUnit.

Oct 28 '23 16:10 BennoLossin

@pvdrz gentle nudge, do you have any thoughts on the proposal here? I think RFL may have a use case

Nov 10 '23 20:11 tgross35

Additional context since I keep forgetting: https://lore.kernel.org/rust-for-linux/[email protected]/ and its parent

Basically we have a C struct that has interior mutability on some fields, so usually it is in an UnsafeCell. Using these bitfield methods mean that you need to create an &self without the UnsafeCell - that is, you have a reference to something with interior mutability but the compiler has no way of knowing it has that interior mutability. So incorrect things can potentially happen quietly.

Nov 18 '23 00:11 tgross35

When you say "since the C side might modify the value at any time.", I'm unclear if you mean other threads will modify the data concurrently?

Otherwise, if this is all within a single thread, then there's not really a problem as far as I can tell.

Nov 24 '23 04:11 Lokathor

In the use case, we have an `UnsafeCell< pointer to a struct with:

Some fields that never change while we have it (i.e. are const or locking is handled by the caller)
Some fields that are mutex-protected and could change if we are preempted while using it

The issue is that it is in general incorrect to take &* reference to the struct to access the const fields because the other could change, but that is required for bitfield access. I think the risk for UB here is exceedingly low if you are not also accessing the locked fields, but it is a bit of an easy footgun telling rustc something that isn't true.

I'm not 100% sure I understand this all myself and the locking story is complicated, so I may be misrepresenting some of it.

Nov 24 '23 04:11 tgross35

Well, unsafecell doesn't protect you from any concurrency problems. If there's supposed to be a lock, then the lock must be held to correctly read or write the available field even if a &foo reference is never actually made, because raw pointers still have to follow data race rules.

Nov 24 '23 05:11 Lokathor

I don't think the parent issue showcases the problem well, it's more like:

#[repr(C)]
struct foo {
    // this is a bitfield
    field: u8,
    mtx: mutex_t,
    // mutex protected
    protected: u8
}

// this is how you reference the type normally
struct FooWrapper(UnsafeCell<foo>);

Then you must construct a &foo via unsafe { &*foo_wrapper.0.get() } to use these bitfield accessors. The problem is with:

Context A creates &foo and access field
Preemption hits, context B locks mtx, changes protected, and unlocks
Context A resumes and &foo points to changed data, rustc has no way of knowing about it
Context A accesses field

This is as far as it goes, and I think the incorrectness is purely semantic meaning this is something miri would flag if it could (I believe), but there is no actual UB.

Nov 24 '23 06:11 tgross35

I guess there could be a theoretical race condition if context A then tries to lock mtx and access protected. Locking the mutex will prevent reordering at that level - but maybe rustc could take the "const & with no UnsafeCell never ever changes" very literally and do some reordering itself. So it could load field and protected at the same time (step 4 above) and then lock mtx - meaning you would be getting an unsynchronized read of protected.

That seems like a stretch but maybe not impossible? Probably need @y86-dev to clarify things

Nov 24 '23 06:11 tgross35

I think I follow now. The quick fix might be to distribute the UnsafeCell over the individual fields of foo, but that's got some ergonomic drawbacks i expect

Nov 24 '23 08:11 Lokathor

I guess there could be a theoretical race condition if context A then tries to lock mtx and access protected. Locking the mutex will prevent reordering at that level - but maybe rustc could take the "const & with no UnsafeCell never ever changes" very literally and do some reordering itself. So it could load field and protected at the same time (step 4 above) and then lock mtx - meaning you would be getting an unsynchronized read of protected.

That seems like a stretch but maybe not impossible? Probably need @y86-dev to clarify things

Your example from above is much better than mine, since it actually has a mutex. But the protected field should also be a bitfield, so

// bingden generates
struct Foo {
    protected: u8, // bitfield
    mtx: mutex,
}

The problem lies with the implementation of mutex, since it contains a wait_list, an intrusive circular doubly linked list that contains all threads waiting for the mutex. This list might change even if we lock the mutex (since at any time another thread might try to lock it, but since we hold the lock, it goes to sleep).

I think I follow now. The quick fix might be to distribute the UnsafeCell over the individual fields of foo, but that's got some ergonomic drawbacks i expect

Since Foo is generated by bingden, I do not know if that is possible.

Nov 24 '23 09:11 BennoLossin

I definitely don't think it's currently possible, someone would have to add that ability to bindgen.

Nov 24 '23 15:11 Lokathor

I would prefer the solution outlined in this issue, since relying on raw pointers also avoids the aliasing problems we would have when having a &mut T instead of a &T.

Nov 24 '23 18:11 BennoLossin

You do lose the ability to use method syntax, which makes the API kinda weird, but otherwise it's also fine, yes.

Nov 24 '23 18:11 Lokathor

Sounds good to add this fwiw.

Feb 02 '24 12:02 emilio