jack2 icon indicating copy to clipboard operation
jack2 copied to clipboard

Thread safety of jack ringbuffer on ARM

Open Krasjet opened this issue 5 years ago • 5 comments

Describe the bug

This is a follow up to #388.

I am able to consistently reproduce an overbound read problem on a Raspberry Pi 4 using the test program in https://github.com/jackaudio/jack2/issues/388#issuecomment-453283106. Here is a sample output

...
---------------- 418756/1000000 ----------------
---------------- 418757/1000000 ----------------
message (0) != expected (82)

It happens almost deterministically for 1000000 iterations, but the exact timing is random. This program always terminates correctly on x86 and it only occurs on ARM.

The basic problem is that the write pointer is advanced before data is written completely. This is probably due to ARM's relaxed memory ordering. I am not sure about Apple Silicon, but it might also suffer from the same issue. A memory barrier here is not optional.

I will reproduce a cleaned up test program here:

//          Copyright Jean Pierre Cimalando 2018.
// Distributed under the Boost Software License, Version 1.0.
//    (See accompanying file LICENSE or copy at
//          http://www.boost.org/LICENSE_1_0.txt)

// g++ -O2 -o test_case test_case.cc -ljack -lpthread

#include <jack/ringbuffer.h>
#include <memory>
#include <thread>
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>

jack_ringbuffer_t *ring_buffer;
size_t capacity = 1024;
size_t message_count = 100;
size_t message_size = 100;

void thread1()
{
  jack_ringbuffer_t *rb = ::ring_buffer;
  const size_t message_size = ::message_size;
  const size_t message_count = ::message_count;
  std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);

  for (size_t i = 0; i < message_count;) {
    *(size_t *)message_buffer.get() = i;

    size_t can_write=jack_ringbuffer_write_space(rb);
    size_t count=0;
    if (can_write>=message_size) {
      count = jack_ringbuffer_write(rb, (const char *)message_buffer.get(), message_size);
    }
    if (count == message_size)
      ++i;
    else
      sched_yield();
  }
}

void thread2()
{
  jack_ringbuffer_t *rb = ::ring_buffer;
  const size_t message_size = ::message_size;
  const size_t message_count = ::message_count;
  std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);

  for (size_t i = 0; i < message_count;) {
    size_t can_read=jack_ringbuffer_read_space(rb);
    size_t count=0;
    if (can_read>=message_size) {
      count = jack_ringbuffer_read(rb, (char *)message_buffer.get(), message_size);
    }

    if (count == message_size) {
      size_t msg = *(size_t *)message_buffer.get();
      if (msg != i) {
        printf("message (%zu) != expected (%zu)\n", msg, i);
        exit(1);
      }
      ++i;
    }
  }
}

int main()
{
  size_t ntimes = 1000000;
  for (size_t i = 0; i < ntimes; ++i)
  {
    printf("---------------- %4zu/%4zu ----------------\n", i + 1, ntimes);
    jack_ringbuffer_t *rb = ::ring_buffer = jack_ringbuffer_create(::capacity);
    std::thread t1(thread1);
    std::thread t2(thread2);
    t1.join();
    t2.join();
    ::ring_buffer = nullptr;
    delete rb;
  }
  printf("-------------------------------------------\n");
  printf("success!\n");

  return 0;
}

Environment

  • JACK Version: 1.9.12 (ringbuffer.c hasn't been touched for 10 years)
  • Operating System: Linux (Raspberry Pi OS)
  • Installation: package manager

Steps To Reproduce

$ g++ -O2 -o test_case test_case.cc -ljack -lpthread
$ ./test_case

Wait until it fails.

Expected vs. actual behavior

No overbound read should happen and the test program should always exit with a success on ARM.

Krasjet avatar Jan 26 '21 22:01 Krasjet

I need to dedicate some time for this.

Do you mind if I rewrite you test code in C to put it as part of jack tests? The ringbuffer is written in C, and I feel more confident in using raw pointers and not C++ std stuff which might be doing some things behind the scenes.

falkTX avatar Jan 26 '21 22:01 falkTX

The test case was originally from @jpcima. I only removed some unnecessary print statements and checks. I think you should be free to port it as long as the license is kept.

Krasjet avatar Jan 26 '21 22:01 Krasjet

Here is a C version of the test.

/* $ gcc -O2 jack_rb_test.c -ljack -pthread */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#include <jack/ringbuffer.h>

#define RB_SIZE 1024
#define BUF_SIZE 100

static jack_ringbuffer_t *rb;

static void *
th_write(void *x)
{
  char buf[BUF_SIZE] = {0};
  char seq = 0;

  while (1) {
    size_t space;

    /* buf = [seq, 0, 0, 0, ...] */
    buf[0] = seq;
    space = jack_ringbuffer_write_space(rb);

    if (space >= sizeof(buf)) {
      int rc = jack_ringbuffer_write(rb, buf, sizeof(buf));
      if (rc != sizeof(buf)) {
        puts("unexpected write failure");
        exit(1);
      }
      seq++;
    }
  }

  return NULL;
}

static void *
th_read(void *x)
{
  char buf[BUF_SIZE] = {0};
  size_t iter = 1;
  char seq = 0;

  while (1) {
    size_t space;

    /* 256 reads = 1 iteration */
    if (seq == 0)
      printf("iter: %zu\n", iter++);

    space = jack_ringbuffer_read_space(rb);
    if (space >= sizeof(buf)) {
      int rc = jack_ringbuffer_read(rb, buf, sizeof(buf));
      if (rc != sizeof(buf)) {
        puts("unexpected read failure");
        exit(1);
      }

      /* check first element is seq */
      if (buf[0] != seq) {
        printf("reorder: %d (actual) != %d (expected)\n", buf[0], seq);
        exit(1);
      }
      seq++;
    }
  }

  return NULL;
}

int
main(void)
{
  pthread_t t1, t2;

  rb = jack_ringbuffer_create(RB_SIZE);

  pthread_create(&t1, NULL, th_write, NULL);
  pthread_create(&t2, NULL, th_read, NULL);

  pthread_join(t1, NULL);
  pthread_join(t2, NULL);

  jack_ringbuffer_free(rb);
  return 0;
}

Just to show you can reproduce this issue, here is a video:

Krasjet avatar Apr 02 '21 01:04 Krasjet

Finally catching up to this, there was quite some discussion around it..

I am also able to reproduce the issue with the example code from the other ticket, and reusing that same binary with patched jack2 that confirms the fix.

So everything seems all good, thanks for the extra work put into this! One small thing, can you move the usage of macros to always be on its own line instead of appended to one? That is, do not use w = rb->write_ptr; JACK_ACQ_FENCE(); style of code. If you dont have time for that all fine too, just let me know.

falkTX avatar Nov 14 '22 13:11 falkTX

Thanks for testing it out. Yes, I'm quite busy this week so I would prefer you make a small patch to change it.

The rationale to put them on the same line is to keep the "load-acquire" semantic clear, so people know that the acquire fence is for loading the write pointer from another thread. If you don't like the syntax feel free to change it.

Krasjet avatar Nov 14 '22 15:11 Krasjet

One small thing, can you move the usage of macros to always be on its own line instead of appended to one?

OK this is done now

Krasjet avatar Jan 25 '23 16:01 Krasjet

Thanks a lot! Sorry for the slow response, I am again caught up in the middle of a move so dont even have my usual proper setup anymore for a while.

Regarding this issue, I dont think it is 100% fixed yet. I was still able to get a wrong behaviour with the ringbuffer under ARM, but it was not using code that is reusable as single&simple stress-test.

The behaviour is for sure much improved now, so I am merging the PR and closing the issue, but want to mention this because we might need to come back to this later (if I find a way to reliably reproduce more threading issues).

falkTX avatar Jan 28 '23 15:01 falkTX