DietPi icon indicating copy to clipboard operation
DietPi copied to clipboard

Softlock CPU Stuck Renders My Device Unresponsive

Open louisefindlay23 opened this issue 1 year ago • 5 comments

Creating a bug report/issue

  • [x] I have searched the existing open and closed issues

Required Information

  • DietPi version | 9.6.1
  • Distro version | echo $G_DISTRO_NAME $G_RASPBIAN didn't output anything
  • Kernel version | Linux RegencyServer 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux
  • SBC model | x86_64 BIOS CSM - Ryzen 7 1700 - NVIDIA RTX 3050
  • Power supply used | Corsair CX500
  • SD card used | Crucial P5 1TB

Additional Information (if applicable)

  • Software title |
  • I only have Docker and Docker Compose installed with Plex Media Server, Homebridge, Home Assistant and other containers. I have a manually installed version of the NVIDIA GPU driver in order to get GPU transcoding working with Plex.
  • Bug report ID | d6e9ed8b-db51-415f-a36b-954c614173e4

Steps to reproduce

  1. Leave the PC running with the display off and eventually the error occurs.

Expected behaviour

  • The error messages shouldn't appear and the PC shouldn't drop off the network.

Actual behaviour

  • I leave my PC turned on and eventually the error messages below spam the terminal and my PC drops off the network, I can't SSH in or access any of the web interfaces of my docker containers. The PC is unresponsive to CTRL + C so I have to hard power off and power on again to resolve the issue. Eventually the error repeats itself.

IMG_1198

louisefindlay23 avatar Aug 05 '24 19:08 louisefindlay23

Please enable persistent system logs to check where those soft locks start:

dietpi-software uninstall 103
mkdir /var/log/journal
reboot

When it happens again, browse the logs from previous boot session:

journalctl

MichaIng avatar Aug 06 '24 17:08 MichaIng

Sure, @MichaIng. Here is the log https://paste.debian.net/hidden/262b91b4/

Also included a slightly different picture in case that helps.image

louisefindlay23 avatar Sep 12 '24 19:09 louisefindlay23

@louisefindlay23 The journalctl command should show a long list of lines, including those from the last boot session before lockup, and also those very same watchdog and rcu errors from your screenshot. Can you check and paste again the journalctl lines from before those kernel errors? So we see what might have caused it.

Or did the stall happen right at the time when you started that Docker container with sudo docker compose up -d?

MichaIng avatar Sep 12 '24 20:09 MichaIng

@MichaIng, sure. I tried journalctl and also journalctl --list-boots but it only shows one entry even after the issue occurs again, there's no amount of detail for some strange reason. I even tried setting storage to persistent.

The error doesn't occur immediately after the docker start command but some time after they start but never the same amount of the time. Sometimes it's an hour or two or sometimes several days.

louisefindlay23 avatar Sep 12 '24 20:09 louisefindlay23

@louisefindlay23

The journalctl command should show a long list of lines, including those from the last boot session before lockup, and also those very same watchdog and rcu errors from your screenshot. Can you check and paste again the journalctl lines from before those kernel errors? So we see what might have caused it.

Or did the stall happen right at the time when you started that Docker container with sudo docker compose up -d?

Turns out it was user error. I didn't realise I needed to use sudo. 🤦‍♀️

Hopefully this should help, @MichaIng

https://pastebin.com/7UywwLCA

louisefindlay23 avatar Sep 13 '24 17:09 louisefindlay23

Sorry for the late reply. If the issue persists, can you paste a new log please?

MichaIng avatar Oct 18 '24 14:10 MichaIng

Closing this issue. Feel free to reopen if the problem persists.

MichaIng avatar Jun 02 '25 20:06 MichaIng