zigbee2mqtt icon indicating copy to clipboard operation
zigbee2mqtt copied to clipboard

1.39.0 random crashes

Open Ricc68 opened this issue 1 year ago • 29 comments

What happened?

Everything was working fine on 1.38.0-dev but since I switched to 1.39.0, z2m started crashing randomly every couple of hours. The log shows a random crash started with a transaction failed error followed by a crash related to herdsman followed by a tentative restart that crashed again on herdsman.

What did you expect to happen?

No crashes.

How to reproduce it (minimal and precise)

It started happening with ZBDongle-E, fw 7.4.3 and z2m 1.39.0 on ember.

Zigbee2MQTT version

1.39.0

Adapter firmware version

7.4.3

Adapter

zbdongle-e

Setup

HAOS in vm

Debug log

2024-07-06 16:58:15] info: zh:ember: [NCP COUNTERS] 418,478,137,24,0,0,14,14,96,6,0,0,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 [2024-07-06 16:58:15] info: zh:ember: [ASH COUNTERS] 219,153,7,146,0,0,0,0,4545,146,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 [2024-07-06 17:58:30] error: zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=228 Len=5]. [2024-07-06 17:58:30] error: zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!! [2024-07-06 17:58:30] info: zh:ember:queue: Request dispatching stopped; queue=1 priorityQueue=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-07-06 17:58:30] info: zh:ember:uart:ash: Total frames: RX=23, TX=25 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: DATA frames : RX=23, TX=2 [2024-07-06 17:58:30] info: zh:ember:uart:ash: DATA bytes : RX=734, TX=32 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: ACK frames : RX=0, TX=23 [2024-07-06 17:58:30] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: ACK timeouts : RX=0 Error: at Ezsp.ezspReadAndClearCounters (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:2012:19) at Object.func (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:844:42) at EmberRequestQueue.dispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/requestQueue.ts:117:46) [21:18:30] INFO: Preparing to start... [21:18:30] INFO: Socat not enabled [21:18:32] INFO: Starting Zigbee2MQTT... Starting Zigbee2MQTT without watchdog. [2024-07-07 21:18:40] info: z2m: Logging to console, file (filename: log.log) [2024-07-07 21:18:40] info: z2m: Starting Zigbee2MQTT version 1.39.0 (commit #unknown) [2024-07-07 21:18:40] info: z2m: Starting zigbee-herdsman (0.50.1) [2024-07-07 21:18:41] info: zh:ember: Using default stack config. [2024-07-07 21:18:41] info: zh:ember: ======== Ember Adapter Starting ======== [2024-07-07 21:18:41] info: zh:ember:ezsp: ======== EZSP starting ======== [2024-07-07 21:18:41] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:41] info: zh:ember:uart:ash: RTS/CTS config is off, enabling software flow control. [2024-07-07 21:18:41] info: zh:ember:uart:ash: Serial port opened [2024-07-07 21:18:41] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:44] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:44] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:46] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:46] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:49] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:49] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:51] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:51] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:54] error: z2m: Error while starting zigbee-herdsman [2024-07-07 21:18:54] error: z2m: Failed to start zigbee [2024-07-07 21:18:54] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-07 21:18:54] error: z2m: Exiting... [2024-07-07 21:18:54] error: z2m: Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. at EmberAdapter.initEzsp (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:889:19) at EmberAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2722:24) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:129:29) at Zigbee.start (/app/lib/zigbee.ts:63:27) at Controller.start (/app/lib/controller.ts:139:27) at start (/app/index.js:154:5)

Ricc68 avatar Jul 07 '24 19:07 Ricc68

Update: after a restart of the VM, z2m started properly. I have now enabled the debug log to catch more info about the issue.

Ricc68 avatar Jul 07 '24 20:07 Ricc68

I had similar issue but on SLZB-06M, latest firmware (updated from github repo to 7..4.3 instead from controller console which is probably 7.4.1, on Z2M 1.38.0). After that I got random crashes especially on LXC Z2M reboot. Moving to 20240507 FW (7.4.1) seems more stable.

maciekdnd avatar Jul 12 '24 07:07 maciekdnd

@Ricc68 Did the restart definitely fix this problem? If not, can you check your last log from running 1.38.0-dev for the commit hash? Example: z2m: Starting Zigbee2MQTT version 1.39.0-dev (commit #cda8927d) It will point us to one of these commits to see what's changed since.

As you can see from June 1st to June 30th in that list, 1.39.0 had only minimal changes for ember. August release (current dev) will be a big one however, it has a lot of fixes from adding support for v8, and the implementation of concurrency (should smooth things out for users with misbehaving devices).

Nerivec avatar Jul 21 '24 20:07 Nerivec

@Ricc68 Did the restart definitely fix this problem? If not, can you check your last log from running 1.38.0-dev for the commit hash? Example: z2m: Starting Zigbee2MQTT version 1.39.0-dev (commit #cda8927d) It will point us to one of these commits to see what's changed since.

As you can see from June 1st to June 30th in that list, 1.39.0 had only minimal changes for ember. August release (current dev) will be a big one however, it has a lot of fixes from adding support for v8, and the implementation of concurrency (should smooth things out for users with misbehaving devices).

@Nerivec Since I have restarted and enabled the debug log, the problem disappeared. Now I have disabled the debug: let's see if it appears again. My feeling is that it was a malfunction of the serial communication with the dongle because a simple restart without power down of HAOS was fixing it.

Ricc68 avatar Jul 22 '24 05:07 Ricc68

I am having a similar problem. While my devices are quickly available after a reboot by mqtt, the front end can take up to 1 minute to load. I just downgraded to 1.37 and the front end load withing 5-10 seconds of starting.

I have 2 instances in docker with POE coordinators, about 100 devices on each. Same problem with both instances.

swissjon avatar Jul 22 '24 15:07 swissjon

@swissjon That issue should already be fixed in dev/edge (caused by a quirk in Availability unearthed after fixing the startup flow in last release).

Nerivec avatar Jul 22 '24 19:07 Nerivec

@Nerivec With log at info level, z2m crashed in less than 24h:

[2024-07-22 21:38:53] error: zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=40:"GET_NETWORK_PARAMETERS" Seq=120 Len=5]. [2024-07-22 21:38:53] error: zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!! [2024-07-22 21:38:53] info: zh:ember:queue: Request dispatching stopped; queue=1 priorityQueue=1 [2024-07-22 21:38:53] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-07-22 21:38:53] info: zh:ember:uart:ash: Total frames: RX=204, TX=249 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: DATA frames : RX=204, TX=46 [2024-07-22 21:38:53] info: zh:ember:uart:ash: DATA bytes : RX=5137, TX=1225 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: ACK frames : RX=0, TX=203 [2024-07-22 21:38:53] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: ACK timeouts : RX=0 Error: at Ezsp.ezspGetNetworkParameters (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:2659:19) at Object.func (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2791:55) at EmberRequestQueue.dispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/requestQueue.ts:117:46)

Attempting to restart z2m results again in a crash as if the communication with the dongle is somehow stuck:

[2024-07-23 21:49:10] error: z2m: Error while starting zigbee-herdsman [2024-07-23 21:49:10] error: z2m: Failed to start zigbee [2024-07-23 21:49:10] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-23 21:49:10] error: z2m: Exiting... [2024-07-23 21:49:11] error: z2m: Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. at EmberAdapter.initEzsp (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:889:19) at EmberAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2722:24) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:129:29) at Zigbee.start (/app/lib/zigbee.ts:63:27) at Controller.start (/app/lib/controller.ts:139:27) at start (/app/index.js:154:5)

A soft reboot of the HAOS VM, without removing power to the dongle, fixed the issue.

Ricc68 avatar Jul 23 '24 20:07 Ricc68

Would need debug logs to confirm, but from what I'm seeing, I'd say the adapter is disconnecting abruptly (or at least that's what Z2M seems to perceive). Can you check dmesg to see if anything weird is happening with the USBs?

Nerivec avatar Jul 23 '24 21:07 Nerivec

@swissjon That issue should already be fixed in dev/edge (caused by a quirk in Availability unearthed after fixing the startup flow in last release).

Thanks. Now that my setup is back to stable, I will probably just wait for the next stable release.

swissjon avatar Jul 24 '24 05:07 swissjon

@Nerivec I have checked the host dmesg and there is no sign of usb problems at the time of the crash. (Two days ago) I have only found two usb resets compatible with the first restart attempt and the second successful VM restart of yesterday.

Would need debug logs to confirm, but from what I'm seeing, I'd say the adapter is disconnecting abruptly (or at least that's what Z2M seems to perceive). Can you check dmesg to see if anything weird is happening with the USBs?

@Nerivec I have checked the host dmesg and there is no sign of usb problems at the time of the crash. (Two days ago) I have only found two usb resets compatible with the first restart attempt and the second successful VM restart of yesterday.

Ricc68 avatar Jul 24 '24 05:07 Ricc68

You may want to check other available logs too, like the HA ones. In case something interesting pops up around the time of the crash.

You switched back from edge, so, no chance the edge instance is trying to start again (watchdog) or anything along these lines, that could interfere with the regular one? It's a bit far-fetched, and shouldn't, but just in case...

Make sure to set the Z2M log level to debug in case it happens again, so we can get more details about the error (the stuff that happens right before what you posted above).

You could also try the edge add-on again (to update it, just uninstall it, go back to the store, check for updates -top right corner dropdown-, then install it again). See if it happens there too.

Nerivec avatar Jul 24 '24 10:07 Nerivec

@Nerivec it happened again. I have checked host and VM dmesg and HA logs but nothing happened to the usb stick, the logs are not showing anything related. No, the edge z2m was uninstalled before installing the release version. I don't enable the watchdog. I will re enable the debug log of z2m and see if it happens but in my previous try with debug log enabled it did not happen for weeks and with debug log disabled it happens in less than 24 hours so my guess is that maybe is a timing issue that is mitigated by the debug log.

Ricc68 avatar Jul 25 '24 20:07 Ricc68

@Nerivec I've got the crash with debug enabled finally.

[2024-07-28 23:53:00] debug: zh:controller:database: Writing database to '/config/zigbee2mqtt/database.db' [2024-07-28 23:53:00] debug: zh:ember:ezsp: ===> [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=76 Len=5] [2024-07-28 23:53:00] debug: zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=4 frmRx=1] [2024-07-28 23:53:14] debug: zh:ember:ezsp: =x=> [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=76 Len=5] Error: Error: timed out after 14400ms [2024-07-28 23:53:14] error: zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=76 Len=5]. [2024-07-28 23:53:14] error: zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!! [2024-07-28 23:53:14] info: zh:ember:queue: Request dispatching stopped; queue=1 priorityQueue=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-07-28 23:53:14] info: zh:ember:uart:ash: Total frames: RX=38, TX=46 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: DATA frames : RX=38, TX=8 [2024-07-28 23:53:14] info: zh:ember:uart:ash: DATA bytes : RX=1047, TX=187 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: ACK frames : RX=0, TX=38 [2024-07-28 23:53:14] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: ACK timeouts : RX=0 Error: at Ezsp.ezspReadAndClearCounters (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:2012:19) at Object.func (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:844:42) at EmberRequestQueue.dispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/requestQueue.ts:117:46)

Ricc68 avatar Jul 29 '24 20:07 Ricc68

@Nerivec trying to restart, CPU goes to 115% shown on the add-on dashboard, and gives another crash:

[2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2400/2500 [2024-07-29 22:22:49] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-29 22:22:49] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-29 22:22:49] debug: zh:ember:uart:ash: ---> [FRAME type=RST] [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 0/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 100/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 200/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 300/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 400/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 500/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 600/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 700/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 800/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 900/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 1000/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 1100/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 2400/2500 [2024-07-29 22:22:52] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-29 22:22:52] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-29 22:22:52] debug: zh:ember:uart:ash: ---> [FRAME type=RST] [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 0/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 100/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 200/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 300/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 400/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 500/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 600/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 700/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 800/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 900/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1000/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1100/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:54] error: zh:ember:uart:ash: ASH disconnected: ASH_ERROR_RESET_FAIL | NCP status: NO_ERROR [2024-07-29 22:22:54] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-29 22:22:54] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-29 22:22:54] debug: zh:ember:uart:ash: ---> [FRAME type=RST] [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 0/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 100/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 200/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 300/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 400/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 500/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 600/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 700/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 800/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 900/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 1000/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1100/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2400/2500 [2024-07-29 22:22:57] error: z2m: Error while starting zigbee-herdsman [2024-07-29 22:22:57] error: z2m: Failed to start zigbee [2024-07-29 22:22:57] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-29 22:22:57] error: z2m: Exiting...

Ricc68 avatar Jul 29 '24 20:07 Ricc68

Do you have the rest of the logs (from this post)?

Nerivec avatar Jul 29 '24 22:07 Nerivec

I have problems since 1.39.0: my guess: I can send commands to devices (turn light on/off) but do not receive messages back (no online/offline, no interviews, etc.). Did reboots and reinstallations but no success. I just reverted back to 1.38.0 and see what happens. First signs are ok: communication is restored.

Bolukan avatar Jul 30 '24 21:07 Bolukan

@Nerivec I am attaching the full log.

log.txt.gz

Ricc68 avatar Jul 31 '24 20:07 Ricc68

Something's not quite right here.

[2024-07-28 22:59:56]
[2024-07-28 23:53:00]

Zero logging in-between?

Nerivec avatar Jul 31 '24 20:07 Nerivec

@Nerivec that was grepped -i -v mqtt to remove useless mqtt info. Now this new attach is complete, not grepped, and yes, the only activity in between is the backup of the state, it seems the z2m communication with the zbdongle-e is stalled. There's definitely something not quite right there. Consider that a soft reboot of the VM restores the communication, no power down of the dongle, so the dongle did not lose the ability to communicate.

log.log.gz

Ricc68 avatar Jul 31 '24 21:07 Ricc68

Rebooting Z2M reboots the dongle (they sync-up during startup). But it seems everything just cuts off, not just the communication to the adapter, Z2M sending, and receiving alike, since there is no logging. Only a couple of time-based triggers remain.

Did you check all available logs (host/vm) around 2024-07-28 23:00:00 for some event that could possibly precipitate this behavior?

Nerivec avatar Aug 01 '24 15:08 Nerivec

Rebooting Z2M reboots the dongle (they sync-up during startup). But it seems everything just cuts off, not just the communication to the adapter, Z2M sending, and receiving alike, since there is no logging. Only a couple of time-based triggers remain.

Did you check all available logs (host/vm) around 2024-07-28 23:00:00 for some event that could possibly precipitate this behavior?

Checked the logs but found nothing that can possibly be related. The fact that it is happening much less frequently in debug log than in info log may be a hint?

Ricc68 avatar Aug 01 '24 16:08 Ricc68

@Nerivec happened also with 1.39.1. log.zip Log attached.

Ricc68 avatar Aug 06 '24 14:08 Ricc68

@Ricc68

  • when the crash happens, do you see anything happening in sudo dmesg?
  • Could you check if the crash also happens when running on different hardware? (maybe you have some Pi lying around?)

Koenkk avatar Aug 06 '24 19:08 Koenkk

@Ricc68

* when the crash happens, do you see anything happening in `sudo dmesg`?

* Could you check if the crash also happens when running on different hardware? (maybe you have some Pi lying around?)

@Koenkk I have checked the dmesg and there's nothing at the time of the crash.

Unfortunately I don't have other hardware on which run haos.

Ricc68 avatar Aug 07 '24 16:08 Ricc68

@Ricc68 there is no need to run haos, just z2m on bare metal would be enough already (docs)

Koenkk avatar Aug 08 '24 18:08 Koenkk

@Ricc68

* when the crash happens, do you see anything happening in `sudo dmesg`?

* Could you check if the crash also happens when running on different hardware? (maybe you have some Pi lying around?)

@Koenkk I have checked the dmesg and there's nothing at the time of the crash.

Unfortunately I don't have other hardware on which run haos.

@Ricc68 there is no need to run haos, just z2m on bare metal would be enough already (docs)

@Koenkk the problem is that I don't have any available hardware to run it. Just some spare esp8266 which is useless for the task.

Ricc68 avatar Aug 08 '24 19:08 Ricc68

You can also run it on a laptop (even on Windows)

Koenkk avatar Aug 09 '24 14:08 Koenkk

@Koenkk I have to report that I still see this issue on 1.40.1 with crashes every 24h or so.

Ricc68 avatar Sep 20 '24 03:09 Ricc68

@Koenkk I will try to run z2m on wsl-linux to have more control in case we want to add debug code and see if I can reproduce it. Hurra! I have it running on Debian-wsl connected to the same devices and same mqtt server so it is exactly the same configuration as the NAS (copied over all config files). I will leave it running for a few hours to see if the issue happens with this setup as well and, if not, I will also be able to edit the code to insert debugs in case it is needed. BTW, this is a 2nd generation core i7 with 16 GB of ram and ssd so it's not the most modern laptop but it has enough juice to run z2m without hitting significantly the system performances.

Ricc68 avatar Sep 21 '24 08:09 Ricc68

@Nerivec I have run the test suggested by @Koenkk and after 24 hours on a debian-wsl z2m was still happily running. On my low resources NAS it takes only a few hours to crash. I have moved the dongle back to the NAS and in the while I have updated the dongle firmware to 7.4.4. If it happens also with the new fw I wonder if there's some debug code that we can run on my z2m HAOS installation to try to nail down the problem.

Ricc68 avatar Sep 22 '24 15:09 Ricc68