apollo icon indicating copy to clipboard operation
apollo copied to clipboard

read buffer overflow and drop message when cyber_recorder record

Open xhzzhang opened this issue 2 years ago • 13 comments

Describe the bug Hey, I found when launch camera(1920x1080 yuv), one of serval record files will has drop 2~12 seconds.

To Reproduce Steps to reproduce the behavior:

  1. use orin 32G and installed ubuntu 20.04
  2. use apollo 8.0 source code, and install docker dev-aarch64-20.04-20231024_1054
  3. cyber_launch start module/drivers/camera/launch/front_6mm or other camera launch file.
  4. see data/log/cyber_recorder_***

Expected behavior record works fine and completed.

Screenshots please see the log message.

Desktop (please complete the following information):

OS: [e.g. iOS] ubuntu 20.04
Browser [e.g. chrome, safari] firefox
Version [e.g. 22]

Additional context Below is log message:

Log file created at: 2024/01/07 15:40:50 Running on machine: in-dev-docker Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0107 15:40:50.759305 11829 global_data.cc:153] host ip: 192.168.1.156 W0107 15:40:50.760537 11829 scheduler_factory.cc:63] Scheduler conf named /apollo/cyber/conf/cyber_recorder_11829.conf not found, use default. I0107 15:40:50.760813 11831 processor.cc:42] processor_tid: 11831 I0107 15:40:50.777197 11832 processor.cc:42] processor_tid: 11832 I0107 15:40:50.777417 11833 processor.cc:42] processor_tid: 11833 I0107 15:40:50.781282 11834 processor.cc:42] processor_tid: 11834 I0107 15:40:50.785274 11835 processor.cc:42] processor_tid: 11835 I0107 15:40:50.811200 11836 processor.cc:42] processor_tid: 11836 I0107 15:40:50.811486 11837 processor.cc:42] processor_tid: 11837 I0107 15:40:50.813271 11838 processor.cc:42] processor_tid: 11838 I0107 15:40:50.829242 11839 processor.cc:42] processor_tid: 11839 I0107 15:40:50.829638 11840 processor.cc:42] processor_tid: 11840 I0107 15:40:50.829968 11841 processor.cc:42] processor_tid: 11841 I0107 15:40:50.830096 11842 processor.cc:42] processor_tid: 11842 I0107 15:40:50.830209 11843 processor.cc:42] processor_tid: 11843 I0107 15:40:50.830271 11844 processor.cc:42] processor_tid: 11844 I0107 15:40:50.830346 11845 processor.cc:42] processor_tid: 11845 I0107 15:40:50.840992 11846 processor.cc:42] processor_tid: 11846 I0107 15:40:50.841241 11829 init.cc:113] Register exit handle succ. I0107 15:40:51.147177 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/planning I0107 15:40:51.156020 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/livox/lidar_n/CustomMsg I0107 15:40:51.161494 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/monitor I0107 15:40:51.179752 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/tf I0107 15:40:51.184409 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/gnss_status I0107 15:40:51.188345 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/ins_status I0107 15:40:51.192723 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/best_pose I0107 15:40:51.196694 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/corrected_imu I0107 15:40:51.199668 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/ins_stat I0107 15:40:51.202936 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/rtk_eph I0107 15:40:51.207324 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/rtk_obs I0107 15:40:51.212024 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/heading I0107 15:40:51.216632 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/imu I0107 15:40:51.220008 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/odometry I0107 15:40:51.224087 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/stream_status I0107 15:40:51.227181 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/raw_data I0107 15:40:51.229889 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/rtcm_data I0107 15:40:51.233539 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/canbus/chassis_detail I0107 15:40:51.237504 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/canbus/chassis I0107 15:40:51.242409 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/localization/pose I0107 15:40:51.271842 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/perception/obstacles I0107 15:40:51.274420 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/gnss/cust/imu I0107 15:40:51.280233 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/control I0107 15:40:51.283149 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/camera/front_6mm/image I0107 15:40:51.291419 11852 scheduler.cc:55] create croutine: cyber_recorder_record_11829_/apollo/sensor/livox/lidar_360/CustomMsg W0107 15:41:41.821874 11846 channel_buffer.h:70] channel[/apollo/control] read buffer overflow, drop_message[101] pre_index[3669] current_index[3770] W0107 15:42:21.469127 11837 channel_buffer.h:70] channel[/apollo/sensor/camera/front_6mm/image] read buffer overflow, drop_message[138] pre_index[718] current_index[856] W0107 15:42:21.469300 11831 channel_buffer.h:70] channel[/apollo/sensor/gnss/cust/imu] read buffer overflow, drop_message[2904] pre_index[15135] current_index[18039] W0107 15:42:21.469463 11832 channel_buffer.h:70] channel[/apollo/control] read buffer overflow, drop_message[562] pre_index[6707] current_index[7269] W0107 15:42:21.469538 11842 channel_buffer.h:70] channel[/apollo/canbus/chassis] read buffer overflow, drop_message[1451] pre_index[7571] current_index[9022] W0107 15:42:21.469605 11844 channel_buffer.h:70] channel[/apollo/localization/pose] read buffer overflow, drop_message[89] pre_index[463] current_index[552] W0107 15:42:21.469666 11846 channel_buffer.h:70] channel[/apollo/perception/obstacles] read buffer overflow, drop_message[144] pre_index[758] current_index[902] W0107 15:42:21.469736 11833 channel_buffer.h:70] channel[/apollo/planning] read buffer overflow, drop_message[144] pre_index[759] current_index[903] W0107 15:42:21.469765 11836 channel_buffer.h:70] channel[/apollo/sensor/gnss/raw_data] read buffer overflow, drop_message[157] pre_index[831] current_index[988] W0107 15:42:21.469818 11843 channel_buffer.h:70] channel[/apollo/sensor/gnss/odometry] read buffer overflow, drop_message[143] pre_index[759] current_index[902] W0107 15:42:21.469856 11838 channel_buffer.h:70] channel[/tf] read buffer overflow, drop_message[143] pre_index[759] current_index[902] W0107 15:42:21.469930 11845 channel_buffer.h:70] channel[/apollo/sensor/gnss/ins_stat] read buffer overflow, drop_message[157] pre_index[831] current_index[988] W0107 15:42:21.470189 11840 channel_buffer.h:70] channel[/apollo/sensor/livox/lidar_360/CustomMsg] read buffer overflow, drop_message[110] pre_index[628] current_index[738] W0107 15:42:21.473907 11839 channel_buffer.h:70] channel[/apollo/sensor/livox/lidar_n/CustomMsg] read buffer overflow, drop_message[121] pre_index[657] current_index[778] W0107 15:42:47.005381 11844 channel_buffer.h:70] channel[/apollo/control] read buffer overflow, drop_message[137] pre_index[9987] current_index[10124] W0107 15:42:47.005589 11834 channel_buffer.h:70] channel[/apollo/sensor/gnss/cust/imu] read buffer overflow, drop_message[55] pre_index[23091] current_index[23146]

xhzzhang avatar Jan 20 '24 02:01 xhzzhang

Maybe you have reached maximum writing speed on your disk. As I remember, writing 1920x1080 images at 20 FPS generates around 5 x 20 = 100 MB/s. I'd advise you to check speed limits on your disk. Also, you can record compressed images instead of raw ones

IvDmNe avatar Jan 22 '24 09:01 IvDmNe

Thanks for your reply, I write to local 32G eMMC, I test speed has 500MB/s: nvidia@nvidia-desktop:~$ dd if=/dev/zero of=1.iso bs=4096 count=102400 102400+0 records in 102400+0 records out 419430400 bytes (419 MB, 400 MiB) copied, 0.810827 s, 517 MB/s

xhzzhang avatar Jan 23 '24 02:01 xhzzhang

How about your msg frequency? Is there abnormal with cyber_recorder deactivate? My system is Ubuntu1804 on x86_64. Same info "read buffer overflow" is presented in log, but no influence for message frequency (about 10Hz).

kaoya722 avatar Jan 23 '24 05:01 kaoya722

I tested serval times to cyber_record with camera image channel(1080p, 10fps), everytime got 'read buffer overflow', sometimes lost 2 seconds message, somtimes lost 10 seconds message, now I can't record camera image channel.

xhzzhang avatar Jan 23 '24 07:01 xhzzhang

Did you use "build_opt_gpu" option to accelerate when compile?

kaoya722 avatar Jan 23 '24 08:01 kaoya722

In orin, we use build_opt_nvidia to compile, Is it use build_opt_gpu to compile for jetson orin?

xhzzhang avatar Jan 23 '24 09:01 xhzzhang

Maybe increasing log level to Debug will get you some useful info. Also, ensure that no other processes are writing something big to the disk at the same time. I still guess that it can be caused by disk.

And do you see stable fps on camera topic in cyber_monitor while recording? To be sure that driver outputs the desired frequency

IvDmNe avatar Jan 23 '24 10:01 IvDmNe

Thanks for you cliealy reply, later will check it, then will reply the result.

xhzzhang avatar Jan 25 '24 00:01 xhzzhang

I have also encountered the same issue. Have you solved it?

Aaron-116 avatar Jan 30 '24 07:01 Aaron-116

not yet, just some debug message, maybe need apollo team to fix it.

xhzzhang avatar Jan 31 '24 08:01 xhzzhang

According to the error message, the message was not read in time, resulting in message loss. It can be judged that the message was not written in time.

https://github.com/ApolloAuto/apollo/blob/3859bee50e92d81e94f64ec4a3b3d4241978e8f5/cyber/tools/cyber_recorder/recorder.cc#L253

I checked the code writing is asynchronous. Therefore, we need to confirm where the bottleneck is. We also encountered the problem of insufficient write performance.

daohu527 avatar Feb 02 '24 02:02 daohu527

Hello, @daohu527 Is anything update about this issue?

xhzzhang avatar Mar 15 '24 06:03 xhzzhang

你好,我看你录制的内容里面有mid360的数据,想请教一下在apollo中接入mid-360中有没有开源的驱动包,我看了livox官方sdk里面只有livox_hub的,没有livox_lidar的

leizi111 avatar Jul 02 '24 01:07 leizi111