batpred icon indicating copy to clipboard operation
batpred copied to clipboard

Predbat self heal - didn't heal from Websocket error

Open thewookiewon opened this issue 11 months ago • 15 comments

Describe the bug Looks like at 1809GMT the Web socket was closed and Predbat never recovered until I noticed and restarted Predbat at 2255GMT. The raspberry Pi5 was fully accessible during that time period as was the home assistant app connection.

Error: Web Socket exception in startup: Cannot connect to host

Expected behaviour I believe the predbat self heal mechanism should handle this?

Predbat version

Addon 1.2.4

Environment details

  • Inverter and battery setup GivEnergy hybrid 3.6 with 9.5 & 5.2 batteries
  • Standard HAOS installer or Docker Standard HAOS on pi5
  • Anything else?

Screenshots If applicable, add screenshots to help explain your problem. The most useful ones can be your battery chart, the Predbat HTML plan and your current settings in HA.

Log file Attached

logs-2.txt

Predbat debug yaml file Not obtained but could use the files from my other ticket raised yesterday.

https://github.com/springfall2008/batpred/issues/2044

thewookiewon avatar Feb 25 '25 23:02 thewookiewon

There are a few people that have reported this, its due to the access token expiring. The fix is to create your own HA access token for predbat to use. I've made this clearer in this documentation update https://github.com/gcoan/batpred/blob/main/docs/apps-yaml.md#home-assistant-connection

gcoan avatar Feb 26 '25 00:02 gcoan

@gcoan i have an access token as per the documentation 😀. In the above log I've just REDACTED the token

Image

thewookiewon avatar Feb 26 '25 09:02 thewookiewon

`[09:07:22] INFO: Predbat init script running Running Predbat inside Add-on Your API key is: e1550e778REDACTED

Bootstrap Predbat Startup Add-on: Predbat Home Battery Prediction and Control

Add-on version: 1.2.4 You are running the latest version of this add-on.`

I do get this after the above line though

2025-02-26 09:07:23.309651: Warn: Failed to decode response <Response [404]> from http://192.168.251.77:8123/addons/self/info

thewookiewon avatar Feb 26 '25 09:02 thewookiewon

Have been getting the same

sparks1372 avatar Mar 07 '25 18:03 sparks1372

2025-02-26 09:07:23.309651: Warn: Failed to decode response <Response [404]> from http://192.168.251.77:8123/addons/self/info

I am just doing final testing of a fix for this warning and will be pushing it through soon.

Its just a warning that the slug id can't be found, which only occurs if you have set ha_url. The only impact is some of the log messages, so its not a major problem, but I'm fixing it

As for why you get the websocket error when you have a ha_key set, I'm sorry, no idea.

gcoan avatar Mar 07 '25 22:03 gcoan

Is there nothing else that predbat can do to restart itself or am I looking into HA automation to try to fix Predbat?

thewookiewon avatar Mar 16 '25 17:03 thewookiewon

I did just push (yesterday) a new release with an auto-restart for some other conditions, maybe worth a try

springfall2008 avatar Mar 16 '25 18:03 springfall2008

@springfall2008 Many thanks, I'll have a look. Is there anything else I can do regarding the HA_Key and why I get these web socket errors when the HA_Key is present?

thewookiewon avatar Mar 16 '25 18:03 thewookiewon

Battery didn't charge last night

2025-03-16 18:02:41.310848: Warn: Web Socket closed, will try to reconnect in 5 seconds - error count 3 2025-03-16 18:02:46.312296: Info: Start socket for url HTTP:// REDACTED:8123/api/websocket 2025-03-16 18:02:46.314145: Error: Web Socket exception in startup: Cannot connect to host REDACTED:8123 ssl:default [Connect call failed (REDACTED', 8123)]

thewookiewon avatar Mar 17 '25 09:03 thewookiewon

I suspect this is my root cause problem

https://github.com/home-assistant/core/issues/135428

thewookiewon avatar Mar 17 '25 10:03 thewookiewon

OK might have noticed something when I trigger HA to restart from Settings > System > power button > Restart Home Assistant (yellow icon).

Predbat never seems to recovery itself via this method of HA restart, gives me the Web Socket errors and needs the addon to be restarted.

2025-03-17 15:05:02.082611: Warn: Historical day 7 has 55 minutes of gap in the data, filled from 24.98 kWh to make new average 25.98 kWh (percent 96%) 2025-03-17 15:05:04.068475: Info: record_status Demand 2025-03-17 15:09:18.572912: Warn: Web Socket closed, will try to reconnect in 5 seconds - error count 0 2025-03-17 15:09:23.574704: Info: Start socket for url http://192.168.251.77:8123/api/websocket 2025-03-17 15:09:23.585074: Error: Web Socket exception in startup: Cannot connect to host 192.168.251.77:8123 ssl:default [Connect call failed ('192.168.251.77', 8123)]

thewookiewon avatar Mar 17 '25 15:03 thewookiewon

OK might have noticed something when I trigger HA to restart from Settings > System > power button > Restart Home Assistant (yellow icon).

Predbat never seems to recovery itself via this method of HA restart, gives me the Web Socket errors and needs the addon to be restarted.

I'm not surprised this can cause problems because you're restarting HA so the socket connection to HA gets broken. If you restart via advanced options / restart HA and all add-ons then it will be OK.

Whether Predbat can trap these errors is a different question

gcoan avatar Mar 17 '25 21:03 gcoan

I'll cve that with the predbat experts to set off its doable 😋

For now I've created a HA automation using these triggers

`triggers:

  • trigger: state entity_id:
    • switch.predbat_active for: hours: 0 minutes: 20 seconds: 0 to: null
  • trigger: state entity_id:
    • predbat.status to: "Error: Exception raised Auto-restart triggered" for: hours: 0 minutes: 20 seconds: 0
  • trigger: state entity_id:
    • predbat.status to: >- "Inverter 0 read bad REST data from HTTP:// REDACTED HA IP:6345/runAll - REST will be disabled" for: hours: 0 minutes: 20 seconds: 0`

thewookiewon avatar Mar 19 '25 20:03 thewookiewon

There is a predbat error monitor I wrote in the documentation that you could use https://springfall2008.github.io/batpred/output-data/#predbat-error-monitor, it checks for a number of other indicators of predbat being stuck

switch.predbat_active being stuck on for a long period of time makes sense, I'll add that

the error: exception, I have that, but a more broader check for the text 'error'

bad REST data, I don't check for that, possibly could, but I hardly ever see that one

anyway, have a look at the automation

gcoan avatar Mar 19 '25 20:03 gcoan

@gcoan now thats sweet, didn't spot that part of the page before. Now adopted into my automation

thewookiewon avatar Mar 19 '25 21:03 thewookiewon