Fix ringbuffer thread safety on ARM. Fix #715 #388
This patch addresses the thread safety problem of jack_ringbuffer_t
mentioned in #715 and #388. The overbound read bug caused by this problem
is impossible to reproduce on x86 due to its strong memory ordering, but
it is a problem on ARM and other weakly ordered architectures.
Basically, the main problem is that, on a weakly ordered architecture,
it is possible that the pointer increment after memcpy becomes visible
to the other thread before memcpy finishes:
memcpy (&(rb->buf[rb->write_ptr]), src, n1);
// vvv can be visible to reading thread before memcpy finishes
rb->write_ptr = (rb->write_ptr + n1) & rb->size_mask;
If this happens, the other thread can read the remaining garbage values
in rb->buf due to be overwritten by the unfinished memcpy.
To fix this, an explicit pair of release/acquire memory fences is
used to ensure the copy on the other thread happens after the memcpy
finishes so no garbage values can be read.
There seems to be some wine issue in the CI scripts. There shouldn't be any problems building on windows.
Nice one. Though since this is C code, we can simply change it to use C11 and use the new functions dedicated to this? Any reason to not go that route?
The main problem is that atomics is an optional component in C11. Notably, MSVC does not support it, even though it is available as a C++ standard library. So to use the portable memory fence, we have to call a C++ function or convert jack_ringbuffer to C++, etc. Since the jack ringbuffer implementation is used elsewhere (pipewire, jack1, etc.), I think the best option is to keep it a self-contained C implementation.
OK, the CI problem seems to be fixed.
Do you think we should remove the volatile qualifier for write_ptr and read_ptr as well? The __atomic_thread_fence intrinsic should already prevent compiler reordering (I'll have to add a compiler barrier for x86 MSVC though).
As I understand, the only case where volatile would matter in presence of __atomic_thread_fence is when being used from a signal handler (in the same thread). Not very likely, and the code doesn't seem robust in that sense anyway, but maybe we should keep them just in case.
FWIW, I did a short review of the changes and they look good to me, I'm not that experienced with memory barriers though. And I also checked that the thread fences are actually used on FreeBSD / Clang. So this is a go from me.
There is actually a small overhead for using volatile. For example, you can see in the assembly generated by gcc and MSVC for jack_ringbuffer_write that every access to write_ptr issues a mov instruction when volatile is present
mov ..., QWORD PTR [rbp+8]
but the write pointer is never going to be updated by the reading thread and it can be cached if we remove volatile.
The volatile keyword in MSVC also has some non-portable semantics that enforces the release-acquire ordering when /volatile:ms is present. It's only enabled by default on x86, not ARM, but when the flag is present when compiling for ARM, the compiler also generates unnecessary ldar and stlr instructions that serves the same purpose as memory fences.
I just checked that removing the volatile keyword in C doesn't seem to break the ABI, so I'll include the change as well.
An additional optimization for read/write space calculation is also added based on the discussion for zix ringbuffer here.
The changes should be ideally cherry-picked to jack1 as well. When it's ready.
I assume this will be merged at some point? 😅
yes of course. sorry for slow / little activity, lots of stuff happening at the same time lately. there is a new incompatibility with new macOS too, still to be fixed. these 2 things warrant a new release, maybe in december.
Apparently the CI is down again for FreeBSD. Not a problem with the code.
FreeBSD CI should be fixed now on the develop branch.