Softlock CPU Stuck Renders My Device Unresponsive
Creating a bug report/issue
- [x] I have searched the existing open and closed issues
Required Information
- DietPi version |
9.6.1 - Distro version |
echo $G_DISTRO_NAME $G_RASPBIANdidn't output anything - Kernel version |
Linux RegencyServer 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux - SBC model |
x86_64 BIOS CSM - Ryzen 7 1700 - NVIDIA RTX 3050 - Power supply used | Corsair CX500
- SD card used | Crucial P5 1TB
Additional Information (if applicable)
- Software title |
- I only have Docker and Docker Compose installed with Plex Media Server, Homebridge, Home Assistant and other containers. I have a manually installed version of the NVIDIA GPU driver in order to get GPU transcoding working with Plex.
- Bug report ID |
d6e9ed8b-db51-415f-a36b-954c614173e4
Steps to reproduce
- Leave the PC running with the display off and eventually the error occurs.
Expected behaviour
- The error messages shouldn't appear and the PC shouldn't drop off the network.
Actual behaviour
- I leave my PC turned on and eventually the error messages below spam the terminal and my PC drops off the network, I can't SSH in or access any of the web interfaces of my docker containers. The PC is unresponsive to CTRL + C so I have to hard power off and power on again to resolve the issue. Eventually the error repeats itself.
Please enable persistent system logs to check where those soft locks start:
dietpi-software uninstall 103
mkdir /var/log/journal
reboot
When it happens again, browse the logs from previous boot session:
journalctl
Sure, @MichaIng. Here is the log https://paste.debian.net/hidden/262b91b4/
Also included a slightly different picture in case that helps.
@louisefindlay23
The journalctl command should show a long list of lines, including those from the last boot session before lockup, and also those very same watchdog and rcu errors from your screenshot. Can you check and paste again the journalctl lines from before those kernel errors? So we see what might have caused it.
Or did the stall happen right at the time when you started that Docker container with sudo docker compose up -d?
@MichaIng, sure. I tried journalctl and also journalctl --list-boots but it only shows one entry even after the issue occurs again, there's no amount of detail for some strange reason. I even tried setting storage to persistent.
The error doesn't occur immediately after the docker start command but some time after they start but never the same amount of the time. Sometimes it's an hour or two or sometimes several days.
@louisefindlay23
The
journalctlcommand should show a long list of lines, including those from the last boot session before lockup, and also those very samewatchdogandrcuerrors from your screenshot. Can you check and paste again thejournalctllines from before those kernel errors? So we see what might have caused it.Or did the stall happen right at the time when you started that Docker container with
sudo docker compose up -d?
Turns out it was user error. I didn't realise I needed to use sudo. 🤦♀️
Hopefully this should help, @MichaIng
https://pastebin.com/7UywwLCA
Sorry for the late reply. If the issue persists, can you paste a new log please?
Closing this issue. Feel free to reopen if the problem persists.