bgpd: Fix read after free in labeled_unicast on shutdown
Fix:
==2937693== Invalid read of size 2 ==2937693== at 0x2EFD44: bgp_delete_listnode (bgp_table.c:170) ==2937693== by 0x2EFB22: bgp_dest_unlock_node (bgp_table.c:81) ==2937693== by 0x26BAC1: check_bgp_lu_cb_unlock (bgp_labelpool.c:208) ==2937693== by 0x26BC58: bgp_lp_finish (bgp_labelpool.c:243) ==2937693== by 0x1FE006: bgp_exit (bgp_main.c:256) ==2937693== by 0x1FDD99: sigint (bgp_main.c:163) ==2937693== by 0x49BA8BB: frr_sigevent_process (sigevent.c:117) ==2937693== by 0x49D5912: event_fetch (event.c:1782) ==2937693== by 0x4955BBF: frr_run (libfrr.c:1216) ==2937693== by 0x1FEA22: main (bgp_main.c:543) ==2937693== Address 0x91af6cc is 60 bytes inside a block of size 96 free'd ==2937693== at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2937693== by 0x4968C57: qfree (memory.c:130) ==2937693== by 0x2EFC55: bgp_node_destroy (bgp_table.c:116) ==2937693== by 0x49CB7DB: route_node_free (table.c:76) ==2937693== by 0x49CB8A2: route_table_free (table.c:111) ==2937693== by 0x49CB6E5: route_table_finish (table.c:46) ==2937693== by 0x2EFA15: bgp_table_unlock (bgp_table.c:35) ==2937693== by 0x2EFA6F: bgp_table_finish (bgp_table.c:44) ==2937693== by 0x3625FA: bgp_free (bgpd.c:4072) ==2937693== by 0x353858: bgp_unlock (bgpd.h:2525) ==2937693== by 0x362411: bgp_delete (bgpd.c:4034) ==2937693== by 0x1FDF13: bgp_exit (bgp_main.c:205) ==2937693== by 0x1FDD99: sigint (bgp_main.c:163) ==2937693== by 0x49BA8BB: frr_sigevent_process (sigevent.c:117) ==2937693== by 0x49D5912: event_fetch (event.c:1782) ==2937693== by 0x4955BBF: frr_run (libfrr.c:1216) ==2937693== by 0x1FEA22: main (bgp_main.c:543) ==2937693== Block was alloc'd at ==2937693== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2937693== by 0x4968B05: qcalloc (memory.c:105) ==2937693== by 0x2A5667: bgp_node_get (bgp_table.h:239) ==2937693== by 0x2A9C36: bgp_afi_node_get (bgp_route.c:154) ==2937693== by 0x2B4BA3: bgp_update (bgp_route.c:4225) ==2937693== by 0x3D5255: bgp_nlri_parse_label (bgp_label.c:452) ==2937693== by 0x290CBF: bgp_nlri_parse (bgp_packet.c:345) ==2937693== by 0x29616F: bgp_update_receive (bgp_packet.c:2453) ==2937693== by 0x29ACEE: bgp_process_packet (bgp_packet.c:4022) ==2937693== by 0x49D5F18: event_call (event.c:2011) ==2937693== by 0x4955BA6: frr_run (libfrr.c:1217) ==2937693== by 0x1FEA22: main (bgp_main.c:543)
This is occurring because the shutdown code is hard deleting the route node but labeled unicast has a pointer to it. Free up the labeled unicast data before hard free'ing the route nodes.
@Mergifyio backport dev/10.0
backport dev/10.0
🟠 Waiting for conditions to match
- [ ]
merged[📌 backport requirement]
Seems new ASAN issue:
ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55ac304015a1 bp 0x7ffeee56b310 sp 0x7ffeee56b270 T0)
==13463==The signal is caused by a READ memory access.
==13463==Hint: address points to the zero page.
#0 0x55ac304015a0 in bgp_lp_release bgpd/bgp_labelpool.c:472
#1 0x55ac304028e5 in bgp_label_per_nexthop_free bgpd/bgp_labelpool.c:1738
#2 0x55ac30414e09 in bgp_mplsvpn_path_nh_label_unlink bgpd/bgp_mplsvpn.c:1298
#3 0x55ac3041d980 in vpn_leak_from_vrf_withdraw_all bgpd/bgp_mplsvpn.c:2009
#4 0x55ac3058aa55 in vpn_leak_prechange bgpd/bgp_mplsvpn.h:264
#5 0x55ac3058aa55 in bgp_delete bgpd/bgpd.c:3862
#6 0x55ac3033ef9d in bgp_exit bgpd/bgp_main.c:202
#7 0x55ac3033ef9d in sigint bgpd/bgp_main.c:163
#8 0x7f317a486289 in frr_sigevent_process lib/sigevent.c:117
#9 0x7f317a4b1e74 in event_fetch lib/event.c:1782
#10 0x7f317a3ea15f in frr_run lib/libfrr.c:1216
#11 0x55ac30340b81 in main bgpd/bgp_main.c:543
#12 0x7f3179412c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#13 0x55ac3033e9d9 in _start (/usr/lib/frr/bgpd+0x2d99d9)
looks like a compile error or some such, running just the one test again to see
looks like a compile error or some such ... needs to be looked at :-(
error seems to be:
<body>
<!--StartFragment-->
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
-- | --
16-Apr-2024 11:55:17 | Unable to publish artifact [ErrorLog]:
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 21.96 ms
16-Apr-2024 11:55:17 | Publishing an artifact: AddressSanitizerError
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
16-Apr-2024 11:55:17 | Unable to publish artifact [AddressSanitizerError]:
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 8.044 ms
16-Apr-2024 11:55:17 | Publishing an artifact: Topotest Details
16-Apr-2024 11:55:17 | Unable to publish artifact [Topotest Details]: the source directory /tmp/topotests does not exist.
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 503.3 μs
16-Apr-2024 11:55:17 | Publishing an artifact: TopotestLogs
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
16-Apr-2024 11:55:17 | Unable to publish artifact [TopotestLogs]:
<!--EndFragment-->
</body>
</html>```
looks like this is after all the tests pass (?)
Failure is here --
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c: In function ‘show_ip_ospf_interface_sub’:
-- | --
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c:3743:3: note: #pragma message: Use all fields following ospfEnabled from interfaceIp hierarchy
24-Jun-2024 21:25:02 | CPP_NOTICE(
24-Jun-2024 21:25:02 | ^~~~~~~~~~
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c: In function ‘ospf_show_gr_helper_details’:
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c:10525:3: note: #pragma message: Remove deprecated json key: restartSupoort
24-Jun-2024 21:25:02 | CPP_NOTICE("Remove deprecated json key: restartSupoort")
I suspect this needs to be rebased ...
@Mergifyio rebase
rebase
✅ Branch has been successfully rebased
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
and then it's downhill from there ... trying to rerun just this failing test to see if we can get past this
ci:rerun
This needs to be rebased and we need to get these CI failures cleared ...