nut icon indicating copy to clipboard operation
nut copied to clipboard

`apcsmart` lost communication with UPS results in intense syslog flood

Open tomiisp opened this issue 6 years ago • 31 comments

Hi,

I get this issue second time, nutups lost communication with UPS (via USB/Serial cable) and nut tools and syslog start eating all 4 cores (cpu quickly reach temperature 78C ), it produce huge log file (my poor SD card...) and it produce about 4500lines in log per second ! Entries in log looks:

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: Warning: excessive comm failures, limiting error reporting

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: Communications with UPS lost: serial port write error: 1379(smartmode): Input/output error

Jun 17 22:00:08 iotgwpc2 apcsmart[1285]: message repeated 9 times: [ Communications with UPS lost: serial port write error: 1379(smartmode): Input/output error]

That USB/serial is only temporary solution, later UPS will be connected directly to onboard UART but this is insane amount of error messages and rate. Is this a bug or there is a option to limit this error messages ?

Orange PI PC2 - Armbian 4.19.38-sunxi64 #5.86 SMP
Network UPS Tools - UPS driver controller 2.7.4

/Tomi

tomiisp avatar Jun 17 '19 20:06 tomiisp

Hitting the same behaviour now, any progress on this?

fatbasstard avatar Aug 30 '22 06:08 fatbasstard

I am not aware of anyone addressing this specifically, so probably fair to say it is a bug, and probably it is still present. Tested PRs for throttling the message emission (maybe slower backoff to retry connecting?) are welcome.

jimklimov avatar Aug 30 '22 19:08 jimklimov

Just had the same happen to me on a Raspberry Pi. Filled my 250GB SSD which subsequently made the Home Assistant database get corrupted. No way of catching it that quickly since it happened while I was sleeping. I'm not happy about this at all.

Any solution or workaround to this? I've just disabled nut for now.

olicooper avatar May 02 '23 11:05 olicooper

Was that also with apcsmart driver? Probably a solution in NUT could be to throttle it sending the error message (or add a config toggle for that effect - e.g. send disconnect infos once at all, or once every N minutes).

With HA involved, the practical solution would also depend on getting modern NUT running there instead of the older package (see wiki for contributed article about custom-building a container).

Another vector could be to configure your syslog daemon log rotation and/or throttling of same messages (would help storage at least, if not cpu stress).

Finally, try to figure out the nature of disconnects and how to cause a reconnect or driver restart - PRs welcome. This would be an actual fix :)

Jim

On Tue, May 2, 2023, 13:53 Oli Cooper @.***> wrote:

Just had the same happen to me on a Raspberry Pi. Filled my 250GB SSD which subsequently made the Home Assistant database get corrupted. No way of catching it that quickly since it happened while I was sleeping. I'm not happy about this at all.

Any solution or workaround to this? I've just disabled nut for now.

— Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/704#issuecomment-1531337019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFEFZ7ZG3NKD34H3TITXEDYSNANCNFSM4HY2BTEQ . You are receiving this because you commented.Message ID: @.***>

jimklimov avatar May 02 '23 17:05 jimklimov