Lost connection to the RPC socket in Darwin.
Issue and Steps to Reproduce
After around 5 minutes working properly, the lightningd gets shut down with the following error:
Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.libc++abi: terminating with uncaught exception of type std::runtime_error: Rpc: read: unexpected end-of-file in RPC socket.
This only happens under Darwin, using the nix package manager, which includes v25.05.2. I also tried under v25.05. I am unable to reproduce in Linux. I suspect the socket buffer gets full, but OSX doesn't allow to increase it, which is kern.ipc.maxsockbuf=8388608.I also tried kern.ipc.somaxconn=2048 (it was 128 by default) but it doesn't help.
I happen to have the plugin clboss, if that means something.
Note that my node is a heavy duty one, it has 200+ channels. I suspect this problem is not reproducible with a small amount of channels, as it gets a while to happen, when tons of connections are already stablished.
No it looks like that it is a rust bug, what rust code are you running on your machine?
No it looks like that it is a rust bug, what rust code are you running on your machine?
I don't know, as I said, I use nix. You may find out useful information in the package declaration here: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/applications/blockchains/clightning/default.nix#L60
The package itself is in here: https://search.nixos.org/packages?channel=unstable&show=clightning&from=0&size=50&sort=relevance&type=packages&query=clightning
I had 0 knowledge of nix I do not know how to read it, but the problem is not cln but the rust plugin that you are running you need to know. It probably is grpc plugin, but I can not confirm it without knowing what plugin (or version of the plugin) are you running.
I suggest redirecting the issue to the nix maintainer, and if any bugs on cln code are found, I will be happy to patch is before the next release.
I had 0 knowledge of nix I do not know how to read it, but the problem is not cln but the rust plugin that you are running you need to know. It probably is grpc plugin, but I can not confirm it without knowing what plugin (or version of the plugin) are you running.
I suggest redirecting the issue to the nix maintainer, and if any bugs on cln code are found, I will be happy to patch is before the next release.
I'd like to help, but please tell me what to do, how do I know what is the culprit plugin and what version of rust it is running?
I think we need just to understand what plugins are you running
I'm only running clboss, nothing more. The other plugins I may be using are the ones enabled by default. That's it.
Anyway, new error now:
lightningd: gossipd failed (exit status 2), exiting.
Lost connection to the RPC socket.libc++abi: terminating with uncaught exception of type std::runtime_error: Rpc: read: unexpected end-of-file in RPC socket.
ah wait ok it is a clnboss issue :) sorry I was mistaken with rust but this is a C++ exception td::runtime_error
Ok this need to be reported to clboss, it is a bug in there
Do you have any more logs that can give more hints to safe close this issue? because lightningd: gossipd failed (exit status 2), exiting. can be a cln crash but without logs I can not help
Let me try tomorrow, too many restarts for my node today. But I'm not sure it could be clboss, the error is unexpected end-of-file in RPC socket which suggest cln actually closed the socked or stopped sending more info. Also, gossipd failed, which may indicate that the problem is in the cln core side.
Wait for tomorrow, please, I'll put the node in DEBUG mode with clboss disabled and send you the edited logs.
yeah put some more logs to help me understand more, otherwise, I am blinded
Here are the new logs with all sensitive information under .... This time I disabled clboss.
2023-07-11T11:40:26.475Z INFO connectd: Static Tor service onion address: "...onion:9735,0.0.0.0:9735" bound from extern port 9735
2023-07-11T11:40:26.884Z INFO plugin-bcli: bitcoin-cli initialized and connected to bitcoind.
Jul 11 12:40:29.000 [notice] Have tried resolving or connecting to address '[scrubbed]' at 3 different places. Giving up.
2023-07-11T11:41:21.112Z INFO lightningd: Restarting onchaind for channel 14738
2023-07-11T11:41:21.113Z UNUSUAL ...-chan#14738: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-07-11T11:41:21.115Z INFO ...-chan#14738: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-07-11T11:41:21.133Z INFO lightningd: Restarting onchaind for channel 15010
2023-07-11T11:41:21.134Z UNUSUAL ...-chan#15010: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-07-11T11:41:21.134Z INFO ...-chan#15010: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-07-11T11:41:21.138Z INFO lightningd: Restarting onchaind for channel 15036
2023-07-11T11:41:21.138Z UNUSUAL ...-chan#15036: Peer permanent failure in ONCHAIN: Funding transaction spent
2023-07-11T11:41:21.139Z INFO ...-chan#15036: State changed from ONCHAIN to FUNDING_SPEND_SEEN
2023-07-11T11:41:21.153Z INFO lightningd: --------------------------------------------------
2023-07-11T11:41:21.153Z INFO lightningd: Server started with public key ..., alias ... (color #...) and lightningd v23.05
2023-07-11T11:41:21.155Z INFO ...-chan#14738: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-07-11T11:41:21.906Z INFO ...-chan#15036: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-07-11T11:41:21.909Z INFO ...-chan#15010: State changed from FUNDING_SPEND_SEEN to ONCHAIN
2023-07-11T11:41:25.391Z INFO ...-chan#14738: RBF onchain txid ... (fee 122sat) with txid ... (fee 122sat)
2023-07-11T11:41:26.119Z INFO ...-chan#14966: Attempt to send HTLC but unowned (CHANNELD_NORMAL)
2023-07-11T11:41:31.317Z INFO ...-chan#14969: Attempt to send HTLC but unowned (CHANNELD_NORMAL)
lightningd: connectd failed (exit status 2), exiting.
Jul 11 12:41:36.000 [warn] Unknown service identity key [scrubbed] on the rendezvous circuit ... with cookie .... Can't find onion service.
Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.connect_to localhost port 9735: failed.
And here is my config file, with sensitive information removed:
wallet=sqlite3:///Users/user/opt/nodo/lightning/lightning/bitcoin/lightningd.sqlite3:/Users/user/opt/nodo/lightning/dbbackup/lightningd.sqlite3
alias=...
rgb=...
bind-addr=0.0.0.0:9735
announce-addr=...:9735
rpc-file=/Users/user/opt/nodo/lightning/lightning/rpc-lightning
network=bitcoin
bitcoin-rpcconnect=127.0.0.1
bitcoin-rpcuser=...
bitcoin-rpcpassword=...
bitcoin-datadir=/Users/user/opt/nodo/bitcoin/
log-level=info
addr=statictor:localhost:9051
announce-addr=....onion:9735
proxy=localhost:9050
always-use-proxy=false
experimental-dual-fund
funder-policy=match
funder-policy-mod=100
lease-fee-base-msat=500sat
lease-fee-basis=50
channel-fee-max-base-msat=100sat
channel-fee-max-proportional-thousandths=2
experimental-offers
I also tried in DEBUG mode but no relevant information is printed to this matter and the log is very big (probing too many channels) to be posted here.
I guess post the debug information can be more helpful to me, you can skip the channels stuff
I guess post the debug information can be more helpful to me, you can skip the channels stuff
It's just too much text, even for regexp substitution of sensitive data. Probably it is better if I can send you the file directly in private. My telegram is "javieroracle", there I can send you the file unedited. Thanks.
you can send me an email, you find it in my account or in all my git commit
I encountered a similar error recently including Lost connection to the RPC socket., but it seems the root cause is an error further up in the logs where the connection to bitcoind was lost due to EOF reached and CLN was not able to recover.
This error presented in CLN running on StartOS. The same sequence persists across restarts of CLN. bitcoind shows no signs of unexpected behavior.
Any ideas/input on the cause and potential resolution would be much appreciated.
2023-07-11T10:34:13-06:00 bitcoin-cli -rpcconnect=bitcoind.embassy -rpcport=8332 -rpcuser=... -stdinrpcpass getblockhash 798283 exited 1 (after 57 other errors) 'error: timeout on transient error: Could not connect to the server bitcoind.embassy:8332 (error code 1 - "EOF reached")
2023-07-11T10:34:13-06:00
2023-07-11T10:34:13-06:00 Make sure the bitcoind server is running and that you are connecting to the correct RPC port.
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.160Z **BROKEN** plugin-bcli: bitcoin-cli -rpcconnect=bitcoind.embassy -rpcport=8332 -rpcuser=... -stdinrpcpass getblockhash 798283 exited 1 (after 57 other errors) 'error: timeout on transient error: Could not connect to the server bitcoind.embassy:8332 (error code 1 - \"EOF reached\")\n\nMake sure the bitcoind server is running and that you are connecting to the correct RPC port.\n'; we have been retrying command for --bitcoin-retry-timeout=60 seconds; bitcoind setup or our --bitcoin-* configs broken?
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.173Z INFO plugin-bcli: Killing plugin: exited during normal operation
2023-07-11T10:34:13-06:00 '; we have been retrying command for --bitcoin-retry-timeout=60 seconds; bitcoind setup or our --bitcoin-* configs broken?The Bitcoin backend died.
2023-07-11T10:34:13-06:00 lightningd: FATAL SIGNAL 6 (version v23.02.2)
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.356Z **BROKEN** lightningd: The Bitcoin backend died.
2023-07-11T10:34:13-06:00 0x55641db72b send_backtrace
2023-07-11T10:34:13-06:00 common/daemon.c:33
2023-07-11T10:34:13-06:00 0x55641db7df crashdump
2023-07-11T10:34:13-06:00 common/daemon.c:46
2023-07-11T10:34:13-06:00 0x7f8a0807a7 ???
2023-07-11T10:34:13-06:00 ???:0
2023-07-11T10:34:13-06:00 0x7f89dafeac ???
2023-07-11T10:34:13-06:00 ???:0
2023-07-11T10:34:13-06:00 0x7f89d9ca9f ???
2023-07-11T10:34:13-06:00 ???:0
2023-07-11T10:34:13-06:00 0x5564178873 fatal_vfmt
2023-07-11T10:34:13-06:00 lightningd/log.c:849
2023-07-11T10:34:13-06:00 0x556417891f fatal
2023-07-11T10:34:13-06:00 lightningd/log.c:859
2023-07-11T10:34:13-06:00 0x5564145adf bitcoin_destructor
2023-07-11T10:34:13-06:00 lightningd/bitcoind.c:34
2023-07-11T10:34:13-06:00 0x5564367c9b notify
2023-07-11T10:34:13-06:00 ccan/ccan/tal/tal.c:240
2023-07-11T10:34:13-06:00 0x556436823f del_tree
2023-07-11T10:34:13-06:00 ccan/ccan/tal/tal.c:402
2023-07-11T10:34:13-06:00 0x5564368613 tal_free
2023-07-11T10:34:13-06:00 ccan/ccan/tal/tal.c:486
2023-07-11T10:34:13-06:00 0x55641a446b plugin_kill
2023-07-11T10:34:13-06:00 lightningd/plugin.c:384
2023-07-11T10:34:13-06:00 0x55641a5387 plugin_conn_finish
2023-07-11T10:34:13-06:00 lightningd/plugin.c:798
2023-07-11T10:34:13-06:00 0x556435a183 destroy_conn
2023-07-11T10:34:13-06:00 ccan/ccan/io/poll.c:246
2023-07-11T10:34:13-06:00 0x556435a1a7 destroy_conn_close_fd
2023-07-11T10:34:13-06:00 ccan/ccan/io/poll.c:252
2023-07-11T10:34:13-06:00 0x5564367c9b notify
2023-07-11T10:34:13-06:00 ccan/ccan/tal/tal.c:240
2023-07-11T10:34:13-06:00 0x556436823f del_tree
2023-07-11T10:34:13-06:00 ccan/ccan/tal/tal.c:402
2023-07-11T10:34:13-06:00 0x5564368613 tal_free
2023-07-11T10:34:13-06:00 ccan/ccan/tal/tal.c:486
2023-07-11T10:34:13-06:00 0x5564358313 io_close
2023-07-11T10:34:13-06:00 ccan/ccan/io/io.c:450
2023-07-11T10:34:13-06:00 0x556435aaaf io_loop
2023-07-11T10:34:13-06:00 ccan/ccan/io/poll.c:457
2023-07-11T10:34:13-06:00 0x556416ec6f io_loop_with_timers
2023-07-11T10:34:13-06:00 lightningd/io_loop_with_timers.c:22
2023-07-11T10:34:13-06:00 0x556417550f main
2023-07-11T10:34:13-06:00 lightningd/lightningd.c:1226
2023-07-11T10:34:13-06:00 0x7f89d9ce17 ???
2023-07-11T10:34:13-06:00 ???:0
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: FATAL SIGNAL 6 (version v23.02.2)
2023-07-11T10:34:13-06:00 0x5564145707 ???
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: common/daemon.c:38 (send_backtrace) 0x55641db78b
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: common/daemon.c:46 (crashdump) 0x55641db7df
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f8a0807a7
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f89dafeac
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f89d9ca9f
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: lightningd/log.c:849 (fatal_vfmt) 0x5564178873
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: lightningd/log.c:859 (fatal) 0x556417891f
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: lightningd/bitcoind.c:34 (bitcoin_destructor) 0x5564145adf
2023-07-11T10:34:13-06:00 ???:0
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.912Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x5564367c9b
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:402 (del_tree) 0x556436823f
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:486 (tal_free) 0x5564368613
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: lightningd/plugin.c:384 (plugin_kill) 0x55641a446b
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: lightningd/plugin.c:798 (plugin_conn_finish) 0x55641a5387
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:246 (destroy_conn) 0x556435a183
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:252 (destroy_conn_close_fd) 0x556435a1a7
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:240 (notify) 0x5564367c9b
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:402 (del_tree) 0x556436823f
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/tal/tal.c:486 (tal_free) 0x5564368613
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.918Z **BROKEN** lightningd: backtrace: ccan/ccan/io/io.c:450 (io_close) 0x5564358313
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.919Z **BROKEN** lightningd: backtrace: ccan/ccan/io/poll.c:457 (io_loop) 0x556435aaaf
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.919Z **BROKEN** lightningd: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x556416ec6f
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.919Z **BROKEN** lightningd: backtrace: lightningd/lightningd.c:1226 (main) 0x556417550f
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.919Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x7f89d9ce17
2023-07-11T10:34:13-06:00 2023-07-11T16:34:13.919Z **BROKEN** lightningd: backtrace: (null):0 ((null)) 0x5564145707
2023-07-11T10:34:18-06:00 Log dumped in crash.log.20230711163413
2023-07-11T10:34:21-06:00 Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.[ERROR - 2023-07-11T16:34:21.461Z]: WebSocket is closed at 2023-07-11T16:34:21.461Z
2023-07-11T10:34:22-06:00 [ERROR - 2023-07-11T16:34:22.536Z]: WebSocket error: {}
2023-07-11T10:34:22-06:00 [ERROR - 2023-07-11T16:34:22.540Z]: WebSocket is closed at 2023-07-11T16:34:22.540Z
2023-07-11T10:34:22-06:00 Aborted (core dumped)
I do not think we can recover from this error, maybe with some plugin like btcli4j you can but in your case bitcoind crash for some strange reason
I encountered a similar error recently including
Lost connection to the RPC socket., but it seems the root cause is an error further up in the logs where the connection to bitcoind was lost due toEOF reachedand CLN was not able to recover.
The Lost connection to the RPC socket. error always happens when the daemon lightningd ends unexpectly. Then, a cascade of plugins cannot connect to it and throw the repeating error in the log. Look for example at my case:
Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.Lost connection to the RPC socket.libc++abi: terminating with uncaught exception of type std::runtime_error: Rpc: read: unexpected end-of-file in RPC socket.
In my case it is 9 times, which I believe is the number of plugins running in the background. So each one of them emits the same error. This can only happens because the CLN daemon has died.
I can confirm this happens outside of nix, just using the brew package (version 23.05.2) the same thing happens. Just in case, I'm using OSX 13.4.1 (22F82), with an M1 processor (it's a Mac Mini).
I have more information about this: after closing half of the channels, now I have less than 100, and now it works ok. This must be an issue about some limits reach which is particular about OSX. Any idea what could be the cause?
Sounds like a file descriptor limit being hit. We recently addressed us running into the FD limit, and this is likely covered by those changes.