bladeRF icon indicating copy to clipboard operation
bladeRF copied to clipboard

Two channel RX with BLADERF_FORMAT_SC16_Q11_META: timestamp increment does no match sample count

Open kuehnhammer opened this issue 4 years ago • 12 comments

When receiving samples through bladerf_sync_rx with two RX streams enabled (BLADERF_RX_X2), the timestamp increment in the returned meta headers of the blocks is always one higher than it should be.

In the API documentation, block size is given as 512 samples for USB3, the metadata replaces the first four samples -- which would leave space for 508 samples in each block. With two RX streams, that would be 254 samples per channel -- so the timestamp increments from one block to the next would be expected to be 254.

In the actually received blocks, the timestamps are incremented by 255.

Block Timestamp in Metadata Expected TS
0 3
1 258 257 (3 + 254)
2 513 512 (258 + 254)

This does not happen for single channel RX. The timestamp values increment with the expected steps of 508 there (Block 0: 3, Block 1: 511, Block 2:1019, etc.)

How to reproduce

The SopayBladeRF module uses BLADERF_FORMAT_SC16_Q11_META, so the issue can e.g. be reproduced with SoapySDRUtil:

SoapySDRUtil --rate=1e6 --direction=RX --channels="0,1"

The Sample discontinuity detected messages show the offset:

Starting stream loop, press Ctrl+C to exit...
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:414] sync_rx: Requests 8192 samples.
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: ----------------________________ (CHECK_WORKER)
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:433] sync_rx: Worker is idle. Going to reset buf mgmt.
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: ----------------________________ (RESET_BUF_MGMT)
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:454] sync_rx: Reset buf_mgmt consumer index
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: ----------------________________ (START_WORKER)
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync_worker.c:425] RX worker: Got request to start
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:469] sync_rx: Worker is now running.
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: ----------------________________ (WAIT_FOR_BUFFER)
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:327] sync_rx: Timed wait for buffer[0] (status: 3).
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync_worker.c:96] RX worker: buf[0] = full, buf[16] = in_flight
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:494] sync_rx: buffer 0 is ready to consume
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: *----------------_______________ (BUFFER_READY)
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: o----------------_______________ (USING_BUFFER_META)
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:605] Got header for message 0: t_new=3, t_old=0
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: o----------------_______________ (USING_BUFFER_META)
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:653] Updated user meta timestamp with: 3
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:669] After copying samples, t=257
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: o----------------_______________ (USING_BUFFER_META)
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 1: Expected t=257, got t=258
0[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:414] sync_rx: Requests 8192 samples.
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: o----------------_______________ (USING_BUFFER_META)
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:653] Updated user meta timestamp with: 258
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:669] After copying samples, t=512
[VERBOSE @ host/libraries/libbladeRF/src/streaming/sync.c:105] dump_buf_states: o----------------_______________ (USING_BUFFER_META)
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 2: Expected t=512, got t=513

Versions

  bladeRF-cli version:        1.8.0-git-637a6c2d
  libbladeRF version:         2.4.1-git-637a6c2d

  Firmware version:           2.4.0-git-a3d5c55f
  FPGA version:               0.14.0 (configured by USB host)

kuehnhammer avatar Feb 07 '22 10:02 kuehnhammer

This is an experimental branch, https://github.com/Nuand/bladeRF/tree/dev-mimo . This branch is only tended for X2 (MIMO) applications, SISO will not function properly. Please give MIMO timestamped RX data a try and let me know how it goes.

rghilduta avatar Feb 10 '22 03:02 rghilduta

Dear @rghilduta,

I ran into the same problem - the metadata mode makes the dual-channel mode fail (BLADERF_RX_X2) both in sync and async mode. I was able to reproduce the problem with SoapySDR as well. After some time, the hardware seems to stop working and running into timeout continuously (never recovering):

Versions:

  • bladeRF-cli version: 1.8.0-git-4d9b0b87
  • libbladeRF version: 2.4.1-git-4d9b0b87
  • Firmware version: 2.4.0-git-a3d5c55f
  • FPGA version: 0.14.0 (configured from SPI flash)
  • soapy-bladerf-0.4.1-6-g70505a5

Output:

[meri@blackbox build]$ SoapySDRUtil --rate=1e6 --direction=RX --channels="0,1"
######################################################
##     Soapy SDR -- the SDR abstraction library     ##
######################################################

[INFO] bladerf_open_with_devinfo()
[INFO] bladerf_get_serial() = dae59df212ec4f37bbbcf2660dde862b
[INFO] setSampleRate(Rx, 0, 4.000000 MHz), actual = 4.000000 MHz
[INFO] setSampleRate(Tx, 0, 4.000000 MHz), actual = 4.000000 MHz
[INFO] setSampleRate(Rx, 0, 1.000000 MHz), actual = 1.000000 MHz
[INFO] setSampleRate(Rx, 1, 1.000000 MHz), actual = 1.000000 MHz
[VERBOSE @ host/libraries/libbladeRF/src/backend/usb/nios_access.c:429] nios_config_read: Read 0x00010001
[VERBOSE @ host/libraries/libbladeRF/src/backend/usb/nios_access.c:440] nios_config_write: Wrote 0x00010001
[VERBOSE @ host/libraries/libbladeRF/src/board/bladerf2/common.c:343] check_total_sample_rate: active_channels=1, rate_accum=1000000, maximum=80000000
[VERBOSE @ host/libraries/libbladeRF/src/board/bladerf2/common.c:343] check_total_sample_rate: active_channels=2, rate_accum=2000000, maximum=80000000
Stream format: CS16
Num channels: 2
Element size: 4 bytes
Begin RX rate test at 1 Msps
Starting stream loop, press Ctrl+C to exit...
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:433] sync_rx: Worker is idle. Going to reset buf mgmt.
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:454] sync_rx: Reset buf_mgmt consumer index
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:469] sync_rx: Worker is now running.
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 1: Expected t=257, got t=258
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 2: Expected t=512, got t=513
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 3: Expected t=767, got t=768
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 4: Expected t=1022, got t=1023
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 5: Expected t=1277, got t=1278
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 6: Expected t=1532, got t=1533
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 0, message 7: Expected t=1787, got t=1788
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 1, message 0: Expected t=2042, got t=2043
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 1, message 1: Expected t=2297, got t=2298
0[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:597] Sample discontinuity detected @ buffer 1, message 2: Expected t=2552, got t=2553
...
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/backend/usb/libusb.c:1089] Transfer timed out for buffer 0x55bd61c633a0
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:433] sync_rx: Worker is idle. Going to reset buf mgmt.
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:454] sync_rx: Reset buf_mgmt consumer index
[DEBUG @ host/libraries/libbladeRF/src/streaming/sync.c:469] sync_rx: Worker is now running.
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms
[ERROR @ host/libraries/libbladeRF/src/streaming/sync.c:336] wait_for_buffer: Timed out waiting for buf_ready after 100 ms

meriac avatar Mar 09 '22 18:03 meriac

Dear @rghilduta,

I did more experiments - by instrumenting sync_rx_meta.c I was able to create a simpler piece of code to reproduce the problem.

You can see that the timestamp difference is 255 - which doesn't make sense for 508 samples (the timestamp should be 254?).

[meri@blackbox build]$ ./host/output/libbladeRF_example_sync_rx_meta 
Opening and initializing device...

RX frequency: 910000000 Hz
RX samplerate: 2000000 sps
RX bandwidth: 1500000 Hz
RX gain: 26
TX frequency: 920000000 Hz
TX samplerate: 2000000 sps
TX bandwidth: 1500000
TX gain: -20

Running RX meta "now" example...
RX'd 508 samples at t=00000003 flags=0x01 [OVR]
RX'd 508 samples at t=00000258 flags=0x01 [OVR]
RX'd 508 samples at t=00000513 flags=0x01 [OVR]
RX'd 508 samples at t=00000768 flags=0x01 [OVR]
RX'd 508 samples at t=00001023 flags=0x01 [OVR]
RX'd 508 samples at t=00001278 flags=0x01 [OVR]
RX'd 508 samples at t=00001533 flags=0x01 [OVR]
RX'd 508 samples at t=00001788 flags=0x01 [OVR]
RX'd 508 samples at t=00002043 flags=0x01 [OVR]
RX'd 508 samples at t=00002298 flags=0x01 [OVR]
RX'd 508 samples at t=00002553 flags=0x01 [OVR]
RX'd 508 samples at t=00002808 flags=0x01 [OVR]

meriac avatar Mar 09 '22 18:03 meriac

@rghilduta: For completeness, I can reproduce the problem with https://github.com/Nuand/bladeRF/tree/dev-mimo as well (BLADERF_RX_X2 mode).

meriac avatar Mar 09 '22 18:03 meriac

@rghilduta: I did some packet capture of the problem. I am increasingly sure that the problem is inside the bladeRF firmware/FPGA. In the capture you can see the flawless stream of data - up to a point where the bladeRF chokes up and stops filling its TX buffers from what it seem. Then after a second it runs into the timeout - delivering the its remaining data. The device stops sending after that. bladeRF-metabug

meriac avatar Mar 10 '22 06:03 meriac

There seem to be others facing the same problem receiving metadata in dual-channel mode (keeping in mind that SoapySDR uses meta-flags enabled by default):

  1. BladeRF xA4/SoapySDR: not receiving full number of samples
  2. 2-Channel Receive Error with libbladerf 2021.10 #44

meriac avatar Mar 10 '22 06:03 meriac

@rghilduta: I can confirm that by downgrading the bladeRF FPGA to version v0.11.1 stops the BladeRF from freezing up. The sucessor - v0.12.0 - crashes already the same way. I therefore expect a bug regarding dual-channel meta-data to be introduced in the versions v0.12.0 and following:

[meri@blackbox rf-localizer-BladeRF]$ bladeRF-cli -i
bladeRF> version

  bladeRF-cli version:        1.8.0-git-57316cd9
  libbladeRF version:         2.4.1-git-57316cd9

  Firmware version:           2.4.0-git-a3d5c55f
  FPGA version:               0.11.1 (configured by USB host)

meriac avatar Mar 10 '22 07:03 meriac

@meriac are you sure libbladeRF was compiled with dev-mimo? The git commit hash for that branch is e47902121b8e3f14078c6a509eb6d44c27d73e7c but that does not seem to be reflected in your libbladeRF version tag. Please ensure the dev-mimo branch installation overrides any other libbladeRF installations.

rghilduta avatar Mar 10 '22 07:03 rghilduta

@rghilduta: I've discarded using the dev-mimo branch as it didn't show improvements over the master branch. For your convenience - here are my edits to sync_rx_meta for reproducing the problem. In a nutshell it seems to always happen after the same amount of samples/time - independent of the packet size. This means that for short packets, more packets must be send to reproduce the problem. The resulting sample offset where the problem occurs, seems to be always the same.

bladeRF> version

  bladeRF-cli version:        1.8.0-git-e4790212
  libbladeRF version:         2.4.1-git-e4790212

  Firmware version:           2.4.0-git-a3d5c55f
  FPGA version:               0.14.0 (configured from SPI flash)

commit e47902121b8e3f14078c6a509eb6d44c27d73e7c (HEAD -> dev-mimo, origin/dev-mimo)
Author: Robert Ghilduta <[email protected]>
Date:   Sun Nov 7 12:40:47 2021 -0800

    WIP

And here the resulting error:

[meri@blackbox build]$ ./host/output/libbladeRF_example_sync_rx_meta 
Opening and initializing device...

RX frequency: 910000000 Hz
RX samplerate: 2000000 sps
RX bandwidth: 1500000 Hz
RX gain: 26
RX frequency: 910000000 Hz
RX samplerate: 2000000 sps
RX bandwidth: 1500000 Hz
RX gain: 26
CHBIT = 15

Running RX meta "now" example...
Overrun detected. 508 valid samples were read.
Overrun detected. 508 valid samples were read.
Overrun detected. 508 valid samples were read.
[...]
Overrun detected. 508 valid samples were read.
Overrun detected. 508 valid samples were read.
Overrun detected. 508 valid samples were read.
Overrun detected. 508 valid samples were read.
[ERROR @ host/libraries/libbladeRF/src/backend/usb/libusb.c:1089] Transfer timed out for buffer 0x2278bd0
RX "now" failed: Operation timed out

By downgrading the FPGA to v0.11.1 the software runs without any problems:

[meri@blackbox build]$ ./host/output/libbladeRF_example_sync_rx_meta 
Opening and initializing device...

RX frequency: 910000000 Hz
RX samplerate: 2000000 sps
RX bandwidth: 1500000 Hz
RX gain: 26
RX frequency: 910000000 Hz
RX samplerate: 2000000 sps
RX bandwidth: 1500000 Hz
RX gain: 26
CHBIT = 15

Running RX meta "now" example...
RX'd 4096 samples at t=0x0000000000000003
RX'd 4096 samples at t=0x0000000000000803
RX'd 4096 samples at t=0x0000000000001003
RX'd 4096 samples at t=0x0000000000001803
RX'd 4096 samples at t=0x0000000000002003
RX'd 4096 samples at t=0x0000000000002803
RX'd 4096 samples at t=0x0000000000003003
RX'd 4096 samples at t=0x0000000000003803
[...]

meriac avatar Mar 10 '22 07:03 meriac

I can confirm the reported issues on a bladeRF x40 when trying to use bladerf_sync_rx with the SC16_Q11_META format. Overrun detected. 508 valid samples were read. is reported repeatedly. The issue is indeed resolved for me as well when downgrading the FPGA to v0.11.0.

thasti avatar Apr 02 '22 19:04 thasti

2021-10-04 - v0.14.0 FPGA is still broken for xA4 and xA9 with no later version available - this is super annoying as the latest known version of the FPGA firmware v0.11.1 is outdated and unstable as well. By now my company bought around 10+ bladeRF devices - at this level of support we'll switch to alternatives. The metadata support is severely flawed - and this ticket open since February!

CC: @rghilduta

meriac avatar Jul 11 '22 09:07 meriac

We are currently trying to get a summer release of libbladeRF out and are doing validation testing for several different modes. Meta combined with MIMO mode is one of the modes that is being tested. We however are using libbladeRF, a bladeRF, and Modelsim to carry out these tests. We have not (yet?) gotten to testing all downstream libraries. I would very strongly suggest trying to move to your application momentarily to using libbladeRF directly if you haven't already. Please use libbladeRF and dev-mimo exclusively for MIMO enable meta enabled modes.

I have tested this use case several times at this point, but I am not able to reproduce the issue using just libbladeRF.

As a side node, several things have changed since libbladeRF_example_sync_rx_meta was written. So the unit tests may not all be compatible with the latest dev-mimo (and soon to be summer release but which will contain updated tests).

rghilduta avatar Jul 17 '22 08:07 rghilduta

I can confirm the issue with both current master and dev-mimo. It does work with FPGA image as of commit b02acdd2d3f7601a550f7c1e7f1943db68c9447e and does not work at a39f691b3a04628a54f1ed0f916e7f42e056f235 (the ones in between fail to compile) and any later.

Here is a script that can immediately (in the first received message) reproduce it for me: https://gist.github.com/jenda122/1d657406bdc4ea5cef22459a6b8f2ff7 The script requires Python bindings supporting metadata, for which I have created a pull request https://github.com/Nuand/bladeRF/pull/886

My device is xA4.

jenda122 avatar Oct 22 '22 05:10 jenda122