lightning icon indicating copy to clipboard operation
lightning copied to clipboard

crash when channeld does not have short_channel_id

Open dni opened this issue 1 year ago • 2 comments

Issue and Steps to Reproduce

after an update from 23 to 24 CLN crashed with following dump. we had some issue where we have a lot of pending channel with status "CHANNELD_AWAITING_LOCKIN" where the funding_tx does not exist in mempool. they also show no short_channel_id with the listfunds commands.

+19.469731335 chan#30479DEBUG: Peer has reconnected, state CHANNELD_AWAITING_LOCKIN: connecting subd
+19.520480306 channeld-chan#30479DEBUG: pid 136421, msgfd 69
+19.542003959 lightningdBROKEN: FATAL SIGNAL 11 (version v24.02.2)
+19.542043857 lightningdBROKEN: backtrace: common/daemon.c:38 (send_backtrace) 0x55e094df18d0
+19.542068257 lightningdBROKEN: backtrace: common/daemon.c:75 (crashdump) 0x55e094df1922
+19.542077146 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fa2b380708f
+19.542086299 lightningdBROKEN: backtrace: bitcoin/short_channel_id.c:96 (towire_short_channel_id) 0x55e094e0d80c
+19.542099779 lightningdBROKEN: backtrace: channeld/channeld_wiregen.c:290 (towire_channeld_init) 0x55e094e2cd7c
+19.542114686 lightningdBROKEN: backtrace: lightningd/channel_control.c:1684 (peer_start_channeld) 0x55e094d7d235
+19.542125595 lightningdBROKEN: backtrace: lightningd/peer_control.c:1310 (connect_activate_subd) 0x55e094db2759
+19.542135957 lightningdBROKEN: backtrace: lightningd/peer_control.c:1409 (peer_connected_hook_final) 0x55e094db5828
+19.542146136 lightningdBROKEN: backtrace: lightningd/plugin_hook.c:194 (plugin_hook_call_next) 0x55e094dc5c0c
+19.542155689 lightningdBROKEN: backtrace: lightningd/plugin_hook.c:169 (plugin_hook_callback) 0x55e094dc5dd0
+19.542164044 lightningdBROKEN: backtrace: lightningd/plugin.c:661 (plugin_response_handle) 0x55e094dc02d9
+19.542172875 lightningdBROKEN: backtrace: lightningd/plugin.c:773 (plugin_read_json_one) 0x55e094dc3bdd
+19.542181153 lightningdBROKEN: backtrace: lightningd/plugin.c:824 (plugin_read_json) 0x55e094dc3e88
+19.542190009 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:59 (next_plan) 0x55e094eae93c
+19.542197172 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:407 (do_plan) 0x55e094eaee09
+19.542204247 lightningdBROKEN: backtrace: ccan/ccan/io/io.c:417 (io_ready) 0x55e094eaeea6
+19.542213051 lightningdBROKEN: backtrace: ccan/ccan/io/poll.c:453 (io_loop) 0x55e094eb083b
+19.542218601 lightningdBROKEN: backtrace: lightningd/io_loop_with_timers.c:22 (io_loop_with_timers) 0x55e094d9703b
+19.542231137 lightningdBROKEN: backtrace: lightningd/lightningd.c:1425 (main) 0x55e094d9c7d3
+19.542238026 lightningdBROKEN: backtrace: (null):0 ((null)) 0x7fa2b37e8082
+19.542245868 lightningdBROKEN: backtrace: (null):0 ((null)) 0x55e094d7117d
+19.542250727 lightningdBROKEN: backtrace: (null):0 ((null)) 0xffffffffffffffff

dni avatar Apr 15 '24 14:04 dni

we temporarily used this patch for the node not to immediately crash, and ran lighning-cli dev-forget-channel to remove the broken channels

diff --git a/lightningd/peer_control.c b/lightningd/peer_control.c
index ead4f76a2..d39058dba 100644
--- a/lightningd/peer_control.c
+++ b/lightningd/peer_control.c
@@ -1406,7 +1406,10 @@ static void peer_connected_hook_final(struct peer_connected_hook_payload *payloa
                        log_debug(channel->log, "Peer has reconnected, state %s: connecting subd",
                                  channel_state_name(channel));

-                       connect_activate_subd(ld, channel);
+                       if (strcmp(channel_state_name(channel), "CHANNELD_AWAITING_LOCKIN") != 0
+                          && strcmp(channel_state_name(channel), "CHANNELD_SHUTTING_DOWN") != 0) {
+                               connect_activate_subd(ld, channel);
+                       }
                }
        }

@@ -1799,6 +1802,9 @@ void peer_spoke(struct lightningd *ld, const u8 *msg)
                        log_debug(channel->log,
                                  "Reestablish on %s channel: using channeld to reply",
                                  channel_state_name(channel));
+                       if (strcmp(channel_state_name(channel), "AWAITING_UNILATERAL") != 0) {
+                               return;
+                       }
                        if (socketpair(AF_LOCAL, SOCK_STREAM, 0, fds) != 0) {
                                log_broken(channel->log,
                                           "Failed to create socketpair: %s",

dni avatar Apr 15 '24 15:04 dni

Open CLN bounty on this: https://community.corelightning.org/c/cln-bounties/7222-crash-when-channeld-does-not-have-short_channel_id

BitcoinJiuJitsu avatar Jun 26 '24 06:06 BitcoinJiuJitsu

@dni @madelinevibes I have been working on an initial fix for this crash. Put up what I have so far and looking for feebback: https://github.com/ElementsProject/lightning/pull/8435

Thank you!

kwsantiago avatar Aug 02 '25 00:08 kwsantiago

awesome! I'll divert to @dni to review. We'll soon be moving to discord to manage bounties, if you're not already in the core-lightning server feel free to tag yourself

madelinevibes avatar Aug 03 '25 23:08 madelinevibes

wow cool! :) thanks!. i flew through the pr and i think its good, but i cannot review it i am not familiar with the cln codebase.

dni avatar Aug 04 '25 07:08 dni

We're a little too close to the 25.09 cutoff date so I've marked this for 25.12, thanks for your patience!

madelinevibes avatar Aug 07 '25 02:08 madelinevibes

@madelinevibes given the reply by @rustyrussell here, should this issue be closed out?

kwsantiago avatar Aug 25 '25 02:08 kwsantiago

Thanks for your work on this, @kwsantiago ! Rusty rewrote from another perspective to solve the problem. I'll continue the chat about the bounty in the Discord thread.

madelinevibes avatar Aug 25 '25 04:08 madelinevibes