picamera2 icon indicating copy to clipboard operation
picamera2 copied to clipboard

[BUG] picamera2 randomly stops feeding frames while recording (imx519)

Open zelo opened this issue 1 year ago • 34 comments

Describe the bug After starting recording camera starts feeding frames but after some random time (sometimes its 5s the other 90s) camera stops feeding the frames. The process continues to work, just no frames are produced. There is also no possibility to do request capture when this happens.

Event loop is running, when executed in debugger and paused. There are 3 threads looping and doing nothing interesting. V4L2Encoder.thread_poll - events are empty CameraManager.listen - events are empty NullPreview.listen - events are empty

I also tried to look in logs but there was nothing interesting even with echo 3 > /sys/module/videobuf2_common/parameters/debug

Update: seems like lowering buffer_count from 3 to 2 makes this working. Update2: Nope, that still happens, just take more time Update3: Now it somehow works with buffer_count 3 for 2 hours already. Update4: Works with buffer_count 3 for 10 hours already.

  • I have replugged camera
  • put it in metal case with thermopads

I still wonder what is the cause of that problem and why there is no trace in logs. If I would blame temps then without case and thermopads temps on cpu/gpu was reported around 60 celcius, which doesn't seem too big. If connection was the issue then after restarting the script everything worked properly without touching the hardware.

To Reproduce Run this:

import logging
from datetime import datetime
from time import sleep

from picamera2 import Picamera2
from picamera2.encoders import MJPEGEncoder
from picamera2.outputs import Output


class DummyOutput(Output):
    def outputframe(self, frame, keyframe=True, timestamp=None):
        print(f"Got frame: {datetime.now()}")


c = Picamera2()
c.set_logging(level=logging.DEBUG)
cfg = c.create_still_configuration(
    lores={
        "size": (1984, 1500),
        "format": "YUV420",
    },
    raw=None,
    buffer_count=3,
    queue=False,
)
c.configure(cfg)
c.start(show_preview=False)
e = MJPEGEncoder()
o = DummyOutput()
c.start_encoder(
    encoder=e,
    output=o,
    name="lores",
)

try:
    while True:
        sleep(1)
        print(f"Encoder running: {e.running}")
except Exception as e:
    raise e
finally:
    c.stop()

Expected behaviour Infinite time recording

Console Output, Screenshots

[0:35:26.351156055] [1056]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:35:26.389560072] [1059]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:35:26.391296340] [1059]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx519@1a to Unicam device /dev/media4 and ISP device /dev/media1
[0:35:26.391399339] [1059]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:35:26.393692919] [1056]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:35:26.431342107] [1062]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:35:26.433097263] [1062]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx519@1a to Unicam device /dev/media4 and ISP device /dev/media1
[0:35:26.433166966] [1062]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
picamera2.picamera2 DEBUG: Requesting configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4656, 3496), 'stride': 14016, 'framesize': 48999936}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4656, 3496)}, 'display': None, 'encode': None}
[0:35:26.440347202] [1056]  INFO Camera camera.cpp:1183 configuring streams: (0) 4656x3496-BGR888 (1) 1984x1500-YUV420
[0:35:26.440779089] [1062]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx519@1a - Selected sensor format: 4656x3496-SRGGB10_1X10 - Selected unicam format: 4656x3496-pRAA
picamera2.picamera2 INFO: Configuration successful!
picamera2.picamera2 DEBUG: Final configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4656, 3496), 'stride': 14016, 'framesize': 48999936}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4656, 3496)}, 'display': None, 'encode': None}
picamera2.picamera2 DEBUG: Streams: {'main': <libcamera._libcamera.Stream object at 0x7fac1c8c30>, 'lores': <libcamera._libcamera.Stream object at 0x7f91f20db0>, 'raw': None}
picamera2 DEBUG: Allocated 3 buffers for stream 0 with fds [26, 29, 32]
picamera2 DEBUG: Allocated 3 buffers for stream 1 with fds [35, 38, 41]
picamera2.picamera2 INFO: Camera started
Got frame: 2024-04-07 17:20:11.886097
Encoder running: True
Got frame: 2024-04-07 17:20:11.995518
Got frame: 2024-04-07 17:20:12.105972
Got frame: 2024-04-07 17:20:12.228983
Got frame: 2024-04-07 17:20:12.347904
Got frame: 2024-04-07 17:20:12.468742
Got frame: 2024-04-07 17:20:12.590001
Got frame: 2024-04-07 17:20:12.711057
Got frame: 2024-04-07 17:20:12.833490
[...]
Encoder running: True
Got frame: 2024-04-07 17:20:55.085381
Got frame: 2024-04-07 17:20:55.207543
Got frame: 2024-04-07 17:20:55.325749
Got frame: 2024-04-07 17:20:55.444800
Got frame: 2024-04-07 17:20:55.565147
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
[...]

Hardware :

Raspberry PI 4 Arducam IMX519

Additional context in /boot/firmware/config.txt i have added:

[all]
dtoverlay=imx519
dtoverlay=vc4-kms-v3d,cma-512
apt-cache show python3-picamera2
Package: python3-picamera2
Source: picamera2
Version: 0.3.17-1

output of dmesg with debug enabled

[66084.117396] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 0, plane 0 offset 0x00000000
[66084.125943] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 1, plane 0 offset 0x0136b000
[66084.134110] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 2, plane 0 offset 0x026d6000
[66084.142244] videobuf2_common: [cap-00000000865065ca] __setup_offsets: buffer 3, plane 0 offset 0x03a41000
[66084.142257] videobuf2_common: [cap-00000000865065ca] __vb2_queue_alloc: allocated 4 buffers, 1 plane(s) each
[66084.142355] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 0, plane 0 exported as 45 descriptor
[66084.142387] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 1, plane 0 exported as 46 descriptor
[66084.142408] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 2, plane 0 exported as 47 descriptor
[66084.142426] videobuf2_common: [cap-00000000865065ca] vb2_core_expbuf: buffer 3, plane 0 exported as 48 descriptor
[66084.142441] videobuf2_common: [cap-00000000865065ca] vb2_core_reqbufs: memory in use, orphaning buffers
[66084.142447] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 0
[66084.142450] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 1
[66084.142453] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 2
[66084.142456] videobuf2_common: [cap-00000000865065ca] __vb2_buf_mem_free: freed plane 0 of buffer 3
[66084.142501] videobuf2_common: [cap-00000000865065ca] __vb2_queue_alloc: allocated 32 buffers, 1 plane(s) each
[66084.142695] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 0, plane 0 offset 0x00000000
[66084.142753] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 1, plane 0 offset 0x00005000
[66084.142806] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 2, plane 0 offset 0x0000a000
[66084.142857] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 3, plane 0 offset 0x0000f000
[66084.142908] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 4, plane 0 offset 0x00014000
[66084.142958] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 5, plane 0 offset 0x00019000
[66084.143010] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 6, plane 0 offset 0x0001e000
[66084.143062] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 7, plane 0 offset 0x00023000
[66084.143113] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 8, plane 0 offset 0x00028000
[66084.143163] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 9, plane 0 offset 0x0002d000
[66084.143214] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 10, plane 0 offset 0x00032000
[66084.143268] videobuf2_common: [cap-0000000089aff5d5] __setup_offsets: buffer 11, plane 0 offset 0x00037000
[...]
[66091.780551] videobuf2_common: [out-00000000f55d31ce] __prepare_dmabuf: buffer for plane 0 changed
[66091.781056] videobuf2_common: [out-00000000f55d31ce] vb2_core_qbuf: qbuf of buffer 1 succeeded
[66091.800610] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66091.800626] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 2, state: dequeued
[66091.800803] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66091.800812] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 6, state: dequeued
[66091.911648] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66091.911666] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 3, state: dequeued
[66091.911812] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66091.911822] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 7, state: dequeued
[66092.022738] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66092.022757] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 0, state: dequeued
[66092.022909] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.022918] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 8, state: dequeued
[66092.133809] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: returning done buffer
[66092.133824] videobuf2_common: [cap-00000000865065ca] vb2_core_dqbuf: dqbuf of buffer 1, state: dequeued
[66092.133952] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.133962] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 9, state: dequeued
[66092.244881] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.244895] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 10, state: dequeued
[66092.355953] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.355969] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 11, state: dequeued
[66092.467037] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.467053] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 0, state: dequeued
[66092.578112] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.578127] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 1, state: dequeued
[66092.689190] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.689205] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 2, state: dequeued
[66092.800264] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.800280] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 3, state: dequeued
[66092.911338] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66092.911354] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 4, state: dequeued
[66093.022420] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: returning done buffer
[66093.022439] videobuf2_common: [cap-0000000089aff5d5] vb2_core_dqbuf: dqbuf of buffer 5, state: dequeued

zelo avatar Apr 07 '24 13:04 zelo

Hi, thanks for the report, and for including a straightforward script that demonstrates the problem.

I'll try that on a Pi 4 and see if I get the same problems. Could you confirm that you're running the latest OS (kernel should be 6.6) just so that we can be sure we're running the same thing? I also don't have access to an imx519. Can you also confirm that you're running the standard versions of libcamera and Picamera2 distributed by Raspberry Pi (sometimes 3rd party cameras use custom software which we can't run). Finally, do you have an official Raspberry Pi camera that you know shows the same problem, then I can use that instead? Thanks.

davidplowman avatar Apr 08 '24 09:04 davidplowman

Thanks for your time. yes, I'm running standard version. Deliberately haven't installed arducam software.

My software stack is:

  • 2024-03-15-raspios-bookworm-arm64-lite.img.xz
  • python3-picamera2: 0.3.17-1 (installed trough apt from official sources)
  • python3-libcamera: 0.2.0+rpt20240215-1 (installed trough apt from official sources)

system is running headless, there are no addidtional processes except ssh sessions and python script

zelo@pi:~ $ uname -a
Linux pi 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux
zelo@pi:~ $ vcgencmd bootloader_version
2024/01/22 10:41:21
version 51ed67b03b3dde4e76b345370f312d07aabf45b8 (release)
timestamp 1705920081
update-time 1707078372
capabilities 0x0000007f

I believe that putting it in case with heatsink and replugging camera in the process somehow fixed the problem. Weird thing is that it just stopped producing frames without any trace I could notice and after restarting camera stack it worked for a while as it should. I also tried restarting encoder only but that did not make any difference.

Another thought i have is that maybe it's caused by some sort of interference. Temps looked OK without case, except hw mpeg encoder and memory there was no load. the case is metalic and may act as a shield.

Finally, do you have an official Raspberry Pi camera that you know shows the same problem, then I can use that instead

I will try to reproduce the issue on one of official ones later this week

zelo avatar Apr 08 '24 10:04 zelo

I'm running your script now and will leave it for a day or so. I'll let you know if anything happens!

davidplowman avatar Apr 08 '24 11:04 davidplowman

Actually the problem was not fixed. For test i tried with dummy encoder and had this in code and this runned entire night 🤦

# e = MJPEGEncoder()
e = Encoder()
o = DummyOutput()
c.start_encoder(
    encoder=e,
    output=o,
    name="lores",
)

when i went back to mjpeg encoder it occured again

zelo avatar Apr 08 '24 11:04 zelo

I've left it running overnight but it still seems quite happy. I'll try again with the latest Raspberry Pi OS Lite. Can you say what Raspberry Pi camera you might be able to test with yourself? Then I can use the same one. Thanks!

davidplowman avatar Apr 09 '24 08:04 davidplowman

@davidplowman I connected Pi Camera HD v3 12MPx and will let it run for some time. Will report tomorrow.

Edit: funny it just stopped after 2 minutes one thing, i changed buffer count to make capture faster since its lower res. It hanged with bc=1/3/7/9 although with 1 (queue=true) it took 10 minutes, 1 (queue=false) 1 hour, with 3/7/9 (queue=true) around 2/3 minutes.

I was able to reproduce that behaviour on IMX519 (Arducam 16Mpx), Arducam Hawkeye, and Rpi Camera V3. I Have used 3 different ribbon cables as well. I also tried switching power supply and even used lab one but that haven't changed my situation. Is there possiblity that my pi is somehow damaged?

My setup looks like this: IMG_20240412_231502_1

zelo@pi:~ $ python perun/isolate.py
[0:03:20.754635954] [1105]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:03:20.862707486] [1108]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:03:20.864842712] [1108]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media0
[0:03:20.864950648] [1108]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:03:20.867302172] [1105]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:03:21.006124476] [1111]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:03:21.008093648] [1111]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media0
[0:03:21.008171730] [1111]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
picamera2.picamera2 DEBUG: Requesting configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 7, 'queue': True, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1506), 'stride': 1984, 'framesize': 4481856}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
[0:03:21.015135774] [1105]  INFO Camera camera.cpp:1183 configuring streams: (0) 4608x2592-BGR888 (1) 1984x1506-YUV420
[0:03:21.015548203] [1111]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
picamera2.picamera2 INFO: Configuration successful!
picamera2.picamera2 DEBUG: Final configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 7, 'queue': True, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1506), 'stride': 1984, 'framesize': 4481856}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
picamera2.picamera2 DEBUG: Streams: {'main': <libcamera._libcamera.Stream object at 0x7f76d9d270>, 'lores': <libcamera._libcamera.Stream object at 0x7f76d9d2b0>, 'raw': None}
picamera2 DEBUG: Allocated 7 buffers for stream 0 with fds [26, 29, 32, 35, 38, 41, 44]
picamera2 DEBUG: Allocated 7 buffers for stream 1 with fds [47, 50, 53, 56, 59, 62, 65]
picamera2.picamera2 INFO: Camera started
Got frame: 2024-04-12 23:05:51.852856
Got frame: 2024-04-12 23:05:51.922536
Got frame: 2024-04-12 23:05:51.992129
Got frame: 2024-04-12 23:05:52.060676
Got frame: 2024-04-12 23:05:52.131400
Got frame: 2024-04-12 23:05:52.199579
picamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f76db0b10>
[...]
picamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f76db12d0>
Got frame: 2024-04-12 23:07:32.051924
Got frame: 2024-04-12 23:07:32.062610
Got frame: 2024-04-12 23:07:32.206700
Got frame: 2024-04-12 23:07:32.211447
Got frame: 2024-04-12 23:07:32.284496
Got frame: 2024-04-12 23:07:32.325323
Got frame: 2024-04-12 23:07:32.391842
Got frame: 2024-04-12 23:07:32.462790
picamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f76db0c90>

zelo avatar Apr 12 '24 21:04 zelo

I also tried pi V3 camera on pi zero w and it crashed twice with different errors:

  • first with Unable to handle kernel paging request at virtual address after 2 hours 11 minutes, script was still running after crash.
  • Second with Internal error: Oops - undefined instruction: 0 [#1] ARM run failed after 1 hour 53 minutes.

I have orderered pi 5 hopefully that will work

    buffer_count=3,
    queue=True,
zelo@pi:~ $ uname -a
Linux pi 6.1.21+ #1642 Mon Apr  3 17:19:14 BST 2023 armv6l GNU/Linux

zelo@pi:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)"
NAME="Raspbian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

First run error

[ 1021.766654] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[ 1048.067692] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[ 1066.975680] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[ 8942.369300] 8<--- cut here ---
[ 8942.369383] Unable to handle kernel paging request at virtual address bfa89b9c
[ 8942.369422] [bfa89b9c] *pgd=00000000
[ 8942.369463] Internal error: Oops: 80000005 [#1] ARM
[ 8942.369494] Modules linked in: cmac algif_hash aes_arm aes_generic ecb algif_skcipher af_alg bnep imx708 dw9807_vcm hci_uart btbcm bluetooth ecdh_generic ecc libaes brcmfmac vc4 8021q garp stp llc brcmutil snd_soc_hdmi_codec sha256_generic libsha256 drm_display_helper cfg80211 cec drm_dma_helper drm_kms_helper snd_soc_core bcm2835_unicam bcm2835_codec(C) snd_compress v4l2_dv_timings snd_pcm_dmaengine bcm2835_v4l2(C) bcm2835_isp(C) syscopyarea sysfillrect v4l2_fwnode v4l2_mem2mem raspberrypi_hwmon i2c_mux_pinctrl sysimgblt i2c_mux v4l2_async bcm2835_mmal_vchiq(C) fb_sys_fops rfkill videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common snd_bcm2835(C) videodev snd_pcm snd_timer i2c_bcm2835 snd mc vc_sm_cma(C) uio_pdrv_genirq fixed uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 8942.369986] CPU: 0 PID: 1906 Comm: python Tainted: G         C         6.1.21+ #1642
[ 8942.370025] Hardware name: BCM2835
[ 8942.370045] PC is at 0xbfa89b9c
[ 8942.370078] LR is at video_usercopy+0x384/0x574 [videodev]
[ 8942.370834] pc : [<bfa89b9c>]    lr : [<bf24db90>]    psr: 60000013
[ 8942.370861] sp : dc905e50  ip : c6fc8000  fp : c0c2f02c
[ 8942.370884] r10: b07fc4ec  r9 : 00000000  r8 : c511e200
[ 8942.370907] r7 : dc905e74  r6 : 00000000  r5 : c044560f  r4 : c050560f
[ 8942.370932] r3 : 0000003c  r2 : bf26f240  r1 : e2bb8a4f  r0 : c3344c78
[ 8942.370956] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 8942.370986] Control: 00c5387d  Table: 0248c008  DAC: 00000055
[ 8942.371008] Register r0 information: slab kmalloc-1k start c3344c00 pointer offset 120 size 1024
[ 8942.371072] Register r1 information: non-paged memory
[ 8942.371103] Register r2 information: 59-page vmalloc region starting at 0xbf246000 allocated at load_module+0x888/0x1d04
[ 8942.371162] Register r3 information: non-paged memory
[ 8942.371191] Register r4 information: non-slab/vmalloc memory
[ 8942.371220] Register r5 information: non-slab/vmalloc memory
[ 8942.371248] Register r6 information: NULL pointer
[ 8942.371273] Register r7 information: 2-page vmalloc region starting at 0xdc904000 allocated at kernel_clone+0xac/0x31c
[ 8942.371337] Register r8 information: slab kmalloc-64 start c511e200 pointer offset 0 size 64
[ 8942.371391] Register r9 information: NULL pointer
[ 8942.371418] Register r10 information: non-paged memory
[ 8942.371445] Register r11 information: non-slab/vmalloc memory
[ 8942.371473] Register r12 information: slab task_struct start c6fc8000 pointer offset 0
[ 8942.371523] Process python (pid: 1906, stack limit = 0x440e1ba1)
[ 8942.371560] Stack: (0xdc905e50 to 0xdc906000)
[ 8942.371590] 5e40:                                     dc905e73 c0052d60 c48fc000 0000003c
[ 8942.371625] 5e60: b5063d40 dc905eb4 00000001 0000003c 00006590 00000002 00000009 00000000
[ 8942.371663] 5e80: 00004003 00000001 00000000 00001ec2 00000000 00032bc8 00000000 00000000
[ 8942.371698] 5ea0: 00000000 00000000 00000000 00000000 00000001 c511e200 00000001 00000000
[ 8942.371736] 5ec0: 00000000 c1ba9040 c4942a80 00000000 dc905f2c c08c6378 00000026 00000000
[ 8942.371772] 5ee0: c0c2f02c dc905f78 0eb14990 00000822 00000000 e2bb8a4f dc905f80 c044560f
[ 8942.371807] 5f00: c48fc001 c48fc000 c0c2f02c b07fc4ec 00000000 00000033 c3347770 c022c198
[ 8942.371844] 5f20: fffffe30 5ac3c35a dc905f3c c08c6a24 00000002 dc905fb0 c0c2f02c c001272c
[ 8942.371879] 5f40: 00000000 00000000 00000000 e2bb8a4f 00000001 c6d6f961 c6d6f960 c0c2f02c
[ 8942.371914] 5f60: 016eb9d0 00000026 c6fc8000 c0216e88 c6d6f961 c0216e88 00000000 e2bb8a4f
[ 8942.371952] 5f80: 001e73e8 00000044 01a60520 00000033 00000036 c0008408 c6fc8000 00000000
[ 8942.371989] 5fa0: 01a60520 c0008260 00000044 01a60520 00000033 c044560f b07fc4ec 00000000
[ 8942.372025] 5fc0: 00000044 01a60520 00000033 00000036 00000001 b3da2188 00000000 01a60520
[ 8942.372062] 5fe0: 00479320 b07fc4ac 000736a0 b6d3056c 60000010 00000033 00000000 00000000
[ 8942.372116]  video_usercopy [videodev] from sys_ioctl+0x10c/0x90c
[ 8942.372732]  sys_ioctl from ret_fast_syscall+0x0/0x1c
[ 8942.372787] Exception stack(0xdc905fa8 to 0xdc905ff0)
[ 8942.372824] 5fa0:                   00000044 01a60520 00000033 c044560f b07fc4ec 00000000
[ 8942.372864] 5fc0: 00000044 01a60520 00000033 00000036 00000001 b3da2188 00000000 01a60520
[ 8942.372898] 5fe0: 00479320 b07fc4ac 000736a0 b6d3056c
[ 8942.372934] Code: bad PC value
[ 8942.372962] ---[ end trace 0000000000000000 ]---
[ 8942.372985] note: python[1906] exited with irqs disabled

Second run error

[ 7195.713400] Internal error: Oops - undefined instruction: 0 [#1] ARM
[ 7195.713519] Modules linked in: cmac algif_hash aes_arm aes_generic ecb algif_skcipher af_alg bnep imx708 dw9807_vcm hci_uart btbcm bluetooth brcmfmac ecdh_generic ecc libaes vc4 brcmutil sha256_generic libsha256 snd_soc_hdmi_codec 8021q garp stp drm_display_helper llc cec cfg80211 drm_dma_helper drm_kms_helper snd_soc_core bcm2835_unicam snd_compress v4l2_dv_timings snd_pcm_dmaengine bcm2835_codec(C) syscopyarea sysfillrect v4l2_fwnode sysimgblt fb_sys_fops i2c_mux_pinctrl v4l2_async bcm2835_isp(C) v4l2_mem2mem bcm2835_v4l2(C) i2c_mux raspberrypi_hwmon bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops rfkill videobuf2_v4l2 videobuf2_common snd_bcm2835(C) videodev snd_pcm snd_timer i2c_bcm2835 snd mc vc_sm_cma(C) uio_pdrv_genirq fixed uio drm fuse drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[ 7195.714019] CPU: 0 PID: 753 Comm: python Tainted: G         C         6.1.21+ #1642
[ 7195.714060] Hardware name: BCM2835
[ 7195.714081] PC is at linux_banner+0xc7a78/0x1dd784
[ 7195.714146] LR is at handle_mm_fault+0x67c/0xd94
[ 7195.714204] pc : [<c0a0a1f4>]    lr : [<c01ce4f8>]    psr: a0000113
[ 7195.714228] sp : dc905ec8  ip : c001cf7c  fp : cb96267c
[ 7195.714252] r10: 00000000  r9 : dc905fb0  r8 : c0c2f02c
[ 7195.714273] r7 : 00000000  r6 : 00000000  r5 : ab7d0000  r4 : 00000000
[ 7195.714297] r3 : 022e538c  r2 : c52b2080  r1 : 00000000  r0 : c52c7f40
[ 7195.714321] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 7195.714350] Control: 00c5387d  Table: 027ac008  DAC: 00000055
[ 7195.714373] Register r0 information: non-slab/vmalloc memory
[ 7195.714408] Register r1 information: NULL pointer
[ 7195.714433] Register r2 information: slab task_struct start c52b2080 pointer offset 0
[ 7195.714487] Register r3 information: non-paged memory
[ 7195.714514] Register r4 information: NULL pointer
[ 7195.714539] Register r5 information: non-paged memory
[ 7195.714564] Register r6 information: NULL pointer
[ 7195.714590] Register r7 information: NULL pointer
[ 7195.714615] Register r8 information: non-slab/vmalloc memory
[ 7195.714643] Register r9 information: 2-page vmalloc region starting at 0xdc904000 allocated at kernel_clone+0xac/0x31c
[ 7195.714709] Register r10 information: NULL pointer
[ 7195.714736] Register r11 information: non-slab/vmalloc memory
[ 7195.714765] Register r12 information: non-slab/vmalloc memory
[ 7195.714793] Process python (pid: 753, stack limit = 0x2fa239ec)
[ 7195.714829] Stack: (0xdc905ec8 to 0xdc906000)
[ 7195.714861] 5ec0:                   c6fe5c40 c336f8c0 dc905f2c c08c6378 00000000 0966734f
[ 7195.714900] 5ee0: ab7d0000 cb96267c ffffffff c113f750 00000cc0 000ab7d0 ab7d0000 ab7d0000
[ 7195.714935] 5f00: 00000a55 c27aead8 c27aead8 00000000 00000000 00000000 c52c7740 c6fe5c70
[ 7195.714971] 5f20: 00000000 541df672 00000817 dc905fb0 c6fe5c40 ab7d0000 00000817 00000255
[ 7195.715007] 5f40: c113f750 00000002 c6fe5c70 c08cd358 dc905f74 c01b2f4c 00000022 000ab3a4
[ 7195.715044] 5f60: dc905f70 c0c34bfc 00000817 c08cd220 ab7d0000 dc905fb0 5b59515b b554aa40
[ 7195.715080] 5f80: b1a49fd0 c00189fc ab3a4008 b6f88b44 20000010 ffffffff b6f88b44 20000010
[ 7195.715117] 5fa0: ffffffff c52b2080 00c5387d c0008f94 ab7d0000 ad9fd008 01dfff78 5b58515b
[ 7195.715156] 5fc0: 515b5851 57515b57 5c57515b 515b5851 58515b58 5b59515b b554aa40 b1a49fd0
[ 7195.715194] 5fe0: 515c5951 b18fdbec 00000058 b6f88b44 20000010 ffffffff 00000000 00000000
[ 7195.715243]  handle_mm_fault from do_page_fault+0x138/0x3a8
[ 7195.715332]  do_page_fault from do_DataAbort+0x40/0xb4
[ 7195.715398]  do_DataAbort from __dabt_usr+0x54/0x60
[ 7195.715445] Exception stack(0xdc905fb0 to 0xdc905ff8)
[ 7195.715476] 5fa0:                                     ab7d0000 ad9fd008 01dfff78 5b58515b
[ 7195.715513] 5fc0: 515b5851 57515b57 5c57515b 515b5851 58515b58 5b59515b b554aa40 b1a49fd0
[ 7195.715548] 5fe0: 515c5951 b18fdbec 00000058 b6f88b44 20000010 ffffffff
[ 7195.715583] Code: 73c5cb76 68737213 70f86db3 5fe469a4 (f6c517fa)
[ 7195.715613] ---[ end trace 0000000000000000 ]---
[ 7195.715636] note: python[753] exited with irqs disabled

zelo avatar Apr 13 '24 16:04 zelo

Can you confirm that a fresh and unchanged install of Raspberry Pi OS Lite (with all updates applied) produces an error from the script posted above? If so, we can try to reproduce that.

davidplowman avatar Apr 15 '24 08:04 davidplowman

@davidplowman Yes, I just did that

Below are ALL steps i did

  • Raspberry Pi Imager (Pi OS Lite (64 bit)) with following customizations applied: user/password, wifi, ssh
  • boot
  • ssh to pi
  • sudo apt-get update && sudo apt-get install python3-picamera2
  • python isolate.py (copy paste from my first comment here)
zelo@pi:~ $ libcamera-hello --list-cameras
Available cameras
-----------------
0 : imx708 [4608x2592 10-bit RGGB] (/base/soc/i2c0mux/i2c@1/imx708@1a)
    Modes: 'SRGGB10_CSI2P' : 1536x864 [120.13 fps - (768, 432)/3072x1728 crop]
                             2304x1296 [56.03 fps - (0, 0)/4608x2592 crop]
                             4608x2592 [14.35 fps - (0, 0)/4608x2592 crop]
zelo@pi:~ $ rpi-eeprom-update
BOOTLOADER: up to date
   CURRENT: Mon 22 Jan 10:41:21 UTC 2024 (1705920081)
    LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
   RELEASE: default (/lib/firmware/raspberrypi/bootloader-2711/default)
            Use raspi-config to change the release.

  VL805_FW: Using bootloader EEPROM
     VL805: version unknown. Try sudo rpi-eeprom-update
   CURRENT:
    LATEST:
zelo@pi:~ $ vcgencmd version
Feb 29 2024 12:24:53
Copyright (c) 2012 Broadcom
version f4e2138c2adc8f3a92a3a65939e458f11d7298ba (clean) (release) (start)
zelo@pi:~ $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
zelo@pi:~ $ python isolate.py
[0:04:05.064977470] [4510]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:04:05.188209832] [4515]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:04:05.190341572] [4515]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media2
[0:04:05.190458827] [4515]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
[0:04:05.193241396] [4510]  INFO Camera camera_manager.cpp:284 libcamera v0.2.0+46-075b54d5
[0:04:05.316364799] [4518]  WARN RPiSdn sdn.cpp:39 Using legacy SDN tuning - please consider moving SDN inside rpi.denoise
[0:04:05.318255010] [4518]  INFO RPI vc4.cpp:447 Registered camera /base/soc/i2c0mux/i2c@1/imx708@1a to Unicam device /dev/media4 and ISP device /dev/media2
[0:04:05.318330266] [4518]  INFO RPI pipeline_base.cpp:1144 Using configuration file '/usr/share/libcamera/pipeline/rpi/vc4/rpi_apps.yaml'
picamera2.picamera2 DEBUG: Requesting configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
[0:04:05.325661008] [4510]  INFO Camera camera.cpp:1183 configuring streams: (0) 4608x2592-BGR888 (1) 1984x1500-YUV420
[0:04:05.326110511] [4518]  INFO RPI vc4.cpp:611 Sensor: /base/soc/i2c0mux/i2c@1/imx708@1a - Selected sensor format: 4608x2592-SBGGR10_1X10 - Selected unicam format: 4608x2592-pBAA
picamera2.picamera2 INFO: Configuration successful!
picamera2.picamera2 DEBUG: Final configuration: {'use_case': 'still', 'transform': <libcamera.Transform 'identity'>, 'colour_space': <libcamera.ColorSpace 'sYCC'>, 'buffer_count': 3, 'queue': False, 'main': {'format': 'BGR888', 'size': (4608, 2592), 'stride': 13824, 'framesize': 35831808}, 'lores': {'format': 'YUV420', 'size': (1984, 1500), 'stride': 1984, 'framesize': 4464000}, 'raw': None, 'controls': {'NoiseReductionMode': <NoiseReductionModeEnum.HighQuality: 2>, 'FrameDurationLimits': (100, 1000000000)}, 'sensor': {'bit_depth': 10, 'output_size': (4608, 2592)}, 'display': None, 'encode': None}
picamera2.picamera2 DEBUG: Streams: {'main': <libcamera._libcamera.Stream object at 0x7f71675130>, 'lores': <libcamera._libcamera.Stream object at 0x7f71674430>, 'raw': None}
picamera2 DEBUG: Allocated 3 buffers for stream 0 with fds [26, 29, 32]
picamera2 DEBUG: Allocated 3 buffers for stream 1 with fds [35, 38, 41]
picamera2.picamera2 INFO: Camera started
Got frame: 2024-04-15 11:16:07.553351
Got frame: 2024-04-15 11:16:07.623207
Got frame: 2024-04-15 11:16:07.690469
Got frame: 2024-04-15 11:16:07.763351
Got frame: 2024-04-15 11:16:07.830098
Got frame: 2024-04-15 11:16:07.899900
[...]
Encoder running: True
Got frame: 2024-04-15 11:16:43.976702
Got frame: 2024-04-15 11:16:44.136477
Got frame: 2024-04-15 11:16:44.151790
Got frame: 2024-04-15 11:16:44.323555
Got frame: 2024-04-15 11:16:44.333777
Got frame: 2024-04-15 11:16:44.500120
Got frame: 2024-04-15 11:16:44.515397
Got frame: 2024-04-15 11:16:44.680871
Got frame: 2024-04-15 11:16:44.696186
Got frame: 2024-04-15 11:16:44.851975
Got frame: 2024-04-15 11:16:44.866871
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
Encoder running: True
^Cpicamera2.picamera2 DEBUG: Execute job: <picamera2.job.Job object at 0x7f716779d0>
picamera2.picamera2 INFO: Camera stopped
Traceback (most recent call last):
  File "/home/zelo/isolate.py", line 38, in <module>
    sleep(1)
KeyboardInterrupt
picamera2.picamera2 DEBUG: Camera was not started
picamera2.picamera2 INFO: Camera closed successfully.

zelo avatar Apr 15 '24 09:04 zelo

Thanks. Just so that I'm doing the right thing, is that running on a Pi 4? So it's Pi 4B + latest RPi OS Lite 64-bit + v3 camera + script you initially posted.

davidplowman avatar Apr 15 '24 09:04 davidplowman

@davidplowman Yes

  • pi 4b
  • latest 64 bit os lite (no apt upgrade, or dist-upgrade done)
  • v3 camera
  • short ribbon cable
  • script from first post
  • nothing connected to any other port
  • occurs in multiple locations (i moved the setup around appartment to check if its not some local interference)
  • I run the script remotely trough ssh by wifi
  • new sdcard SanDisk Extreme Pro microSD 64GB U3 A2
Revision        : b03114
Serial          : 10000000f9a62d1e
Model           : Raspberry Pi 4 Model B Rev 1.4

zelo avatar Apr 15 '24 09:04 zelo

I'll try that. Note that I will do an apt update and full-upgrade after installing because that's the recommended procedure to get up-to-date software. From the output you've posted, can I assume that it typically fails within a few minutes?

davidplowman avatar Apr 15 '24 12:04 davidplowman

Yes it usually fails pretty fast

zelo avatar Apr 15 '24 12:04 zelo

Yes I'm seeing it too now. Normally fails in ~20ish seconds. As far as I can tell, we're queuing buffers to the encoder on the GPU but at some point it just stops returning them to us. No idea why.

In order to try and get some more debug, I put start_debug=1 into my /boot/firmware/config.txt (and rebooted) and now it just won't fail. And if I change it back to start_debug=0 it fails really quickly again. So all a bit baffling ATM...

davidplowman avatar Apr 15 '24 15:04 davidplowman

i will try with start_debug=1

Update: For sure it's slower in obtaining frames. I'm also pretty sure i tried with that flag before. I see a lot of assert( vc_image_helper_get_sdram_page_size() == 4096 ) failed; ../../../../../helpers/vc_image/vc_image_helper.c::calculate_pitch line 869 in vclog not sure if that's a problem at all

zelo avatar Apr 15 '24 15:04 zelo

it's great u were able to reproduce the issue

Almost 4 hours without hang (start_debug=1), although slower.

in that time in vclog thousands of

assert( vc_image_helper_get_sdram_page_size() == 4096 ) failed; ../../../../../helpers/vc_image/vc_image_helper.c::calculate_pitch line 869

and few of

1241852.152: sdram: sdram refresh 1562->3124 (2)

and this in dmesg

[14495.963097] bcm2835-codec bcm2835-codec: Failed setting ctrl 00990a67, ret -3
[14495.963699] cma_alloc: 5 callbacks suppressed
[14495.963703] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964023] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964338] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964651] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.964966] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.965329] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.965664] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.965978] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.966293] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12
[14495.966609] cma: cma_alloc: linux,cma: alloc failed, req-size: 192 pages, ret: -12

zelo avatar Apr 15 '24 19:04 zelo

I left it going all night and it was still OK in the morning. And then, as I sat and watched it for just a moment, it failed. A few of those page size asserts, but I think those are spurious. Absolutely nothing else. I will spend some more time digging today.

davidplowman avatar Apr 16 '24 08:04 davidplowman

I've been going back through older GPU firmware versions. As the failure is somewhat intermittent it's hard to be completely sure, but I believe the 6.6.18 kernel may be the last one that worked properly. You can downgrade yourself back to this kernel with

sudo rpi-update a31776f0abb4bee5b79a3334bf748c99a813c084

NOTE: always back up critical files before using rpi-update in case your system no longer boots afterwards!

If you are able to do so, it would be interesting to see if you agree that 6.6.18 is OK. In the meantime, I'll see if I can discover what might have gone wrong after that release.

davidplowman avatar Apr 16 '24 16:04 davidplowman

Running test after kernel downgrade.

I also noticed that firmware updated to new version during command you provided

CURRENT: Mon 15 Apr 13:12:14 UTC 2024 (1713186734)

zelo avatar Apr 16 '24 17:04 zelo

@davidplowman 15 hours without issue and keeps running

zelo avatar Apr 17 '24 08:04 zelo

Mine ran all night too. I think I may have found the change in the GPU firmware that's starting causing this, only it's a system change that doesn't have anything to do with the MJPEG encoder. So there may be some subtle side-effects at play that need to be understood.

davidplowman avatar Apr 17 '24 09:04 davidplowman

I think I've finally understood what's going on. In the end it's mainly down to a bug being introduced and released, and having more widespread effects than people perhaps realised. Whilst the bug has been fixed for a while, our "latest" images for download still don't have the fix.

As I said, kernel 6.6.18 with the associated GPU firmware is OK, the bug was introduced just after that. The first version that has the fix is 6.6.23 with its associated GPU firmware (the bug and fix are actually in the firmware, not the Linux kernel). Unfortunately our "latest" images, even after updating, are still shipping with 6.6.20 at this time.

So updating to any kernel/firmware from 6.6.23 onwards will work, and specifically the latest available version (which you can get with just sudo rpi-update) should be fine. (Same caveats apply as before when using rpi-update.)

I'm told our release images are likely to be updated in the next few days, and we'll make sure the working firmware is included. Hopefully that will resolve this issue for everyone!

One other unrelated thing I noticed about your script. I found the behaviour could be very "choppy", that is, frames weren't being processed steadily and smoothly. This can be caused by the GPU clocks not being increased enough when the Arm side of the processor doesn't think it's very busy. As a workaround, I found that adding isp_freq_min=500 into my /boot/firmware/config.txt make the situation significantly better.

davidplowman avatar Apr 17 '24 12:04 davidplowman

That's great :) Thanks for your time diving in to the problem.

Just to recap, for anyone looking for solution. Either go back to kernel 6.6.18 (sudo rpi-update a31776f0abb4bee5b79a3334bf748c99a813c084) or wait for 6.6.23+ to be shipped and available for update

I guess this can be closed then.

One other unrelated thing I noticed about your script. I found the behaviour could be very "choppy", that is, frames weren't being processed steadily and smoothly. This can be caused by the GPU clocks not being increased enough when the Arm side of the processor doesn't think it's very busy. As a workaround, I found that adding isp_freq_min=500 into my /boot/firmware/config.txt make the situation significantly better.

Nice hint.

zelo avatar Apr 17 '24 15:04 zelo

I encountered a similar problem which is probably caused by the same bug. After stripping down my code to a minimal example I also noticed the CameraManager and preview event loops only returning empty events after some seconds to minutes of runtime until restarting my python script. Unluckily the more unrelated code I removed from my script, the longer it took until it stopped receiving frames, making debugging slower.

Unfortunately I do not run my code on Raspberry Pi OS but on a custom build yocto system, which makes it impossible for others to reproduce, so I am quite happy to find this issue here, thank you! I am currently in the process of verifying different firmware and kernel versions and the problem seems to be gone (long time testing currently in progress) with a 6.1.x kernel and firmware from 2023-10.

I would now like to update to a 6.6.x kernel, but I am uncertain how to identify since which firmware version the bug is fixed. The firmware versioning is a little confusing for me.

The firmware is integrated in my system using the meta-raspberry yocto layer https://github.com/agherzan/meta-raspberrypi/blob/master/recipes-bsp/bootfiles/rpi-bootfiles.bb, which downloads the firmware from [1]. The rpi-update hashes refer to [2], but the hashes in the Raspberry Pi OS release notes refer to [1]. vcgencmd also lists a hash for the version, but this one seems to belong neither to [1] or [2].

A concrete example: I integrated firmware 12af703dd07d7721c1f2f58c6f71d9fe66270838 (dated 2023-10-24) from [1]. After booting, I get

:~# dmesg | grep firmware
[    0.052106] raspberrypi-firmware soc:firmware: Attached to firmware from 2023-10-17T15:39:16, variant start
[    0.056116] raspberrypi-firmware soc:firmware: Firmware hash is 30f0c5e4d076da3ab4f341d88e7d505760b93ad7

Date is different, hash does neither belong to [1] or [2]. How can I check if this is really the firmware from [1] I chose?

  1. https://github.com/raspberrypi/firmware
  2. https://github.com/raspberrypi/rpi-firmware

@davidplowman Can you help me here? My main question is:

  • From which commit in [1] on is the bug fixed?

What I would like to know but didn't find documented:

  • Is there a bugtracker and/or changelog for the firmware?
  • What is the connection between [1], [2] and the hashes and dates printed by vcgencmd or dmesg | grep firmware?

nzottmann avatar Apr 23 '24 14:04 nzottmann

Hi, I'm afraid I'm not too clear myself on these questions. @popcornmix Could you perhaps help us here?

As regards this particular bug, I believe it got broken in https://github.com/raspberrypi/rpi-firmware/commit/545545d44d2f33cae473889b6e94687d578a89e9 and was fixed in https://github.com/raspberrypi/rpi-firmware/commit/285539f763ef2d0612792935cf79bde07f3b7b38.

In the other repository, these look to correspond to https://github.com/raspberrypi/firmware/commit/dc94391863445ab867782d25ca6ae1e88579df5d and https://github.com/raspberrypi/firmware/commit/45319db29eb5e4f67feab5c4194bc1f28c574ed0 respoectively, though I'm not entirely clear how they relate to one another.

davidplowman avatar Apr 23 '24 14:04 davidplowman

rpi-firmware is just a slightly cutdown and reorganised version of firmware repo, used by rpi-update (it reduces update time a little by excluding some files).

Any common files in the two will be binary identical (they are just copied and then committed). Just look for a matching commits message and you should see identical firmware (the start*.elf/fixup*.dat files).

popcornmix avatar Apr 24 '24 14:04 popcornmix

Some test reports:

  • ✅ Kernel 6.1.77, https://github.com/raspberrypi/firmware/commit/12af703dd07d7721c1f2f58c6f71d9fe66270838 2023-10-24
  • ❌ Kernel 6.6.22, https://github.com/raspberrypi/firmware/commit/9f24f4bc2bdd07ffd158cfbb4bce88a2efc4c1f5 2024-03-19 <-- here I encountered the issue
  • ✅ Kernel 6.6.23, https://github.com/raspberrypi/firmware/commit/45319db29eb5e4f67feab5c4194bc1f28c574ed0 2024-03-21

The firmware in the affected build matches the range mentioned, the others are before and after the range. This seems like a good indication I am really affected by the same bug. The affected build stops outputting frames after a few minutes, the others ran for ~24h.

What I am still a little concerned about is that reproducing the bug is too unpredictable. The time until fail differs in each run and is greatly affected by the rest of the python code, in my case the time increases from a few minutes to up to 1-2h the more code I remove. Thus I struggle a bit with determining the testing time to be sure the bug is gone.

@davidplowman With your insights on this bug, can you think of any method to reproduce the fail in a faster and reproducible way?

nzottmann avatar Apr 25 '24 10:04 nzottmann

Hi guys, I am running several Pi 4b's with raspicam v1 and v3 and have started noticing this issue as well.

Unfortunately, upgrading to the latest version of the kernel and firmware has not resolved the issue for me. I am seeing frequent lockups when trying to capture an array after an unknown amount of time. Sometimes the lockups happen after only a few minutes, other times it is over days.

I have over 50 Pi 3B+ running exactly the same application code with no issues. It is only happening on the 4Bs

I see the same thing as @nzottmann -- the more application code I remove, the slower it is to happen.

It happens on both raspicam v1 and v3 (official)

One way I can think of that may accelerate things is to reduce the amount of RAM allocated to the gpu in /boot/firmware/config.txt.

In my testing, it seems that increasing the buffer_count causes things to fail more frequently.

Linux raspberrypi 6.6.39-v8+ #1783 SMP PREEMPT Thu Jul 11 18:56:16 BST 2024 aarch64 GNU/Linux
apt-cache show python3-picamera2
Package: python3-picamera2
Source: picamera2
Version: 0.3.19-1

cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Minimal script, however it has not happened yet with this script:

import time
from picamera2 import Picamera2
import threading

import logging

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class CameraManager:
    def __init__(self):

        self.picam2 = Picamera2()
        capture_config = self.picam2.create_video_configuration(
            main={"size": (640, 480), "format": "RGB888"},
            raw={"size": self.picam2.sensor_resolution},
            buffer_count=6,
            queue=True
        )
        self.picam2.configure(capture_config)
        self.picam2.start(show_preview=False)

        self.heatbeat_lock = threading.Lock()
        self.last_heartbeat = time.time()

        self.watchdog_thread = threading.Thread(target=self._watchdog)
        self.watchdog_thread.start()

        self.camera_thread = threading.Thread(target=self._capture_continuous)
        self.camera_thread.start()

    def _watchdog(self, timeout=10):  # Timeout in seconds
        while True:
            with self.heatbeat_lock:
                last_heartbeat = self.last_heartbeat
            if time.time() - last_heartbeat > timeout:
                logger.error("Camera capture thread unresponsive, attempting to reset.")
            time.sleep(timeout)

    def _capture_continuous(self):
        while True:
            try:
                frame = self.picam2.capture_array()
                self.signal_heartbeat()  # Signal that the thread is alive
            except Exception as e:
                logger.error(f"Failed to capture frame: {e}")
                break

            time.sleep(1 / 25)

    def signal_heartbeat(self):
        with self.heatbeat_lock:
            self.last_heartbeat = time.time()


if __name__ == "__main__":
    camera_manager = CameraManager()
    while True:
        time.sleep(1)

Current boot config for testing:

cat /boot/firmware/config.txt 
# For more options and information see
# http://rptl.io/configtxt
# Some settings may impact device functionality. See link above for details

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

# Additional overlays and parameters are documented
# /boot/firmware/overlays/README

# Automatically load overlays for detected cameras
camera_auto_detect=1

# Automatically load overlays for detected DSI displays
display_auto_detect=1

# Automatically load initramfs files, if found
auto_initramfs=1

# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2

# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1

# Run in 64-bit mode
arm_64bit=1

# Disable compensation for displays with overscan
disable_overscan=1

# Run as fast as firmware / board allows
arm_boost=1

[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1

[all]
gpu_mem=32

djhanove avatar Jul 15 '24 08:07 djhanove

@djhanove While testing the new firmware, I discovered a different issue, which looks similar at a first glance, but is different.

With the affected firmware, my complex application stopped after ~10min and a minimal example after 1-2h. With a newer or later firmware, my application and the minimal example run a much longer time, from 1-2 days up to 5 days. Thus, the was a clear difference with the affected firmware, which is why I assume a second, different issue. Obviously, the new issue did never occur with the affected firmware because it stopped the script earlier.

I would recommend you to test if there is a difference with and without the affected firmware.

I did not open a bug report for the second issue yet because I could not reproduce it reliably with Raspberry Pi OS. I have my own yocto based distribution, so I have to do separate tests on Raspberry Pi OS, which take a long time. My last test is running since 28 days with my complex application on a CM4 IO Board, Camera v3, Raspberry Pi OS Bookworm, 6.6.31+rpt-rpi-v8, firmware 2024-05-24T15:30:04 4942b7633c0ff1af1ee95a51a33b56a9dae47529.

I therefore currently think that it is not a firmware or kernel error, but that the problem lies elsewhere.

nzottmann avatar Jul 15 '24 12:07 nzottmann

Hi @nzottmann, are you seeing different end symptoms? For me the capture_array() call just hangs forever regardless of the versions I am on. This happens seemingly randomly. Could be running for days, could be running for 45 seconds. I cannot replicate it consistently.

So far, symptoms were present for 6.6.39-v8+, picam2 3.19-1, firmware 4942b7633c0ff1af1ee95a51a33b56a9dae47529

and 6.6.31+rpt-rpi-v8, picam2 3.18-1, firmware 4942b7633c0ff1af1ee95a51a33b56a9dae47529

Identical hardware, identical code. Running across 5 different Pi4Bs in my office inside a metal case, powered via PoE.

djhanove avatar Jul 15 '24 12:07 djhanove