frr icon indicating copy to clipboard operation
frr copied to clipboard

bgpd: Fix read after free in labeled_unicast on shutdown

Open donaldsharp opened this issue 2 years ago • 22 comments

Fix:

==2937693== Invalid read of size 2 ==2937693== at 0x2EFD44: bgp_delete_listnode (bgp_table.c:170) ==2937693== by 0x2EFB22: bgp_dest_unlock_node (bgp_table.c:81) ==2937693== by 0x26BAC1: check_bgp_lu_cb_unlock (bgp_labelpool.c:208) ==2937693== by 0x26BC58: bgp_lp_finish (bgp_labelpool.c:243) ==2937693== by 0x1FE006: bgp_exit (bgp_main.c:256) ==2937693== by 0x1FDD99: sigint (bgp_main.c:163) ==2937693== by 0x49BA8BB: frr_sigevent_process (sigevent.c:117) ==2937693== by 0x49D5912: event_fetch (event.c:1782) ==2937693== by 0x4955BBF: frr_run (libfrr.c:1216) ==2937693== by 0x1FEA22: main (bgp_main.c:543) ==2937693== Address 0x91af6cc is 60 bytes inside a block of size 96 free'd ==2937693== at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2937693== by 0x4968C57: qfree (memory.c:130) ==2937693== by 0x2EFC55: bgp_node_destroy (bgp_table.c:116) ==2937693== by 0x49CB7DB: route_node_free (table.c:76) ==2937693== by 0x49CB8A2: route_table_free (table.c:111) ==2937693== by 0x49CB6E5: route_table_finish (table.c:46) ==2937693== by 0x2EFA15: bgp_table_unlock (bgp_table.c:35) ==2937693== by 0x2EFA6F: bgp_table_finish (bgp_table.c:44) ==2937693== by 0x3625FA: bgp_free (bgpd.c:4072) ==2937693== by 0x353858: bgp_unlock (bgpd.h:2525) ==2937693== by 0x362411: bgp_delete (bgpd.c:4034) ==2937693== by 0x1FDF13: bgp_exit (bgp_main.c:205) ==2937693== by 0x1FDD99: sigint (bgp_main.c:163) ==2937693== by 0x49BA8BB: frr_sigevent_process (sigevent.c:117) ==2937693== by 0x49D5912: event_fetch (event.c:1782) ==2937693== by 0x4955BBF: frr_run (libfrr.c:1216) ==2937693== by 0x1FEA22: main (bgp_main.c:543) ==2937693== Block was alloc'd at ==2937693== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==2937693== by 0x4968B05: qcalloc (memory.c:105) ==2937693== by 0x2A5667: bgp_node_get (bgp_table.h:239) ==2937693== by 0x2A9C36: bgp_afi_node_get (bgp_route.c:154) ==2937693== by 0x2B4BA3: bgp_update (bgp_route.c:4225) ==2937693== by 0x3D5255: bgp_nlri_parse_label (bgp_label.c:452) ==2937693== by 0x290CBF: bgp_nlri_parse (bgp_packet.c:345) ==2937693== by 0x29616F: bgp_update_receive (bgp_packet.c:2453) ==2937693== by 0x29ACEE: bgp_process_packet (bgp_packet.c:4022) ==2937693== by 0x49D5F18: event_call (event.c:2011) ==2937693== by 0x4955BA6: frr_run (libfrr.c:1217) ==2937693== by 0x1FEA22: main (bgp_main.c:543)

This is occurring because the shutdown code is hard deleting the route node but labeled unicast has a pointer to it. Free up the labeled unicast data before hard free'ing the route nodes.

donaldsharp avatar Mar 20 '24 13:03 donaldsharp

@Mergifyio backport dev/10.0

ton31337 avatar Mar 20 '24 21:03 ton31337

backport dev/10.0

🟠 Waiting for conditions to match

  • [ ] merged [📌 backport requirement]

mergify[bot] avatar Mar 20 '24 21:03 mergify[bot]

Seems new ASAN issue:

    ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55ac304015a1 bp 0x7ffeee56b310 sp 0x7ffeee56b270 T0)
    ==13463==The signal is caused by a READ memory access.
    ==13463==Hint: address points to the zero page.
        #0 0x55ac304015a0 in bgp_lp_release bgpd/bgp_labelpool.c:472
        #1 0x55ac304028e5 in bgp_label_per_nexthop_free bgpd/bgp_labelpool.c:1738
        #2 0x55ac30414e09 in bgp_mplsvpn_path_nh_label_unlink bgpd/bgp_mplsvpn.c:1298
        #3 0x55ac3041d980 in vpn_leak_from_vrf_withdraw_all bgpd/bgp_mplsvpn.c:2009
        #4 0x55ac3058aa55 in vpn_leak_prechange bgpd/bgp_mplsvpn.h:264
        #5 0x55ac3058aa55 in bgp_delete bgpd/bgpd.c:3862
        #6 0x55ac3033ef9d in bgp_exit bgpd/bgp_main.c:202
        #7 0x55ac3033ef9d in sigint bgpd/bgp_main.c:163
        #8 0x7f317a486289 in frr_sigevent_process lib/sigevent.c:117
        #9 0x7f317a4b1e74 in event_fetch lib/event.c:1782
        #10 0x7f317a3ea15f in frr_run lib/libfrr.c:1216
        #11 0x55ac30340b81 in main bgpd/bgp_main.c:543
        #12 0x7f3179412c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
        #13 0x55ac3033e9d9 in _start (/usr/lib/frr/bgpd+0x2d99d9)

ton31337 avatar Mar 22 '24 07:03 ton31337

looks like a compile error or some such, running just the one test again to see

riw777 avatar Mar 26 '24 14:03 riw777

looks like a compile error or some such ... needs to be looked at :-(

riw777 avatar Apr 16 '24 11:04 riw777

error seems to be:

<body>
<!--StartFragment-->
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
-- | --
16-Apr-2024 11:55:17 | Unable to publish artifact [ErrorLog]:
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 21.96 ms
16-Apr-2024 11:55:17 | Publishing an artifact: AddressSanitizerError
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
16-Apr-2024 11:55:17 | Unable to publish artifact [AddressSanitizerError]:
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 8.044 ms
16-Apr-2024 11:55:17 | Publishing an artifact: Topotest Details
16-Apr-2024 11:55:17 | Unable to publish artifact [Topotest Details]: the source directory /tmp/topotests does not exist.
16-Apr-2024 11:55:17 | The artifact hasn't been successfully published after 503.3 μs
16-Apr-2024 11:55:17 | Publishing an artifact: TopotestLogs
16-Apr-2024 11:55:17 | Failing as no matching files has been found and empty artifacts are not allowed.
16-Apr-2024 11:55:17 | Unable to publish artifact [TopotestLogs]:

<!--EndFragment-->
</body>
</html>```

looks like this is after all the tests pass (?)

riw777 avatar May 28 '24 14:05 riw777

Failure is here --

24-Jun-2024 21:25:02 | ospfd/ospf_vty.c: In function ‘show_ip_ospf_interface_sub’:
-- | --
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c:3743:3: note: #pragma message: Use all fields following ospfEnabled from interfaceIp hierarchy
24-Jun-2024 21:25:02 | CPP_NOTICE(
24-Jun-2024 21:25:02 | ^~~~~~~~~~
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c: In function ‘ospf_show_gr_helper_details’:
24-Jun-2024 21:25:02 | ospfd/ospf_vty.c:10525:3: note: #pragma message: Remove deprecated json key: restartSupoort
24-Jun-2024 21:25:02 | CPP_NOTICE("Remove deprecated json key: restartSupoort")

I suspect this needs to be rebased ...

riw777 avatar Jul 16 '24 15:07 riw777

@Mergifyio rebase

riw777 avatar Jul 16 '24 15:07 riw777

rebase

✅ Branch has been successfully rebased

mergify[bot] avatar Jul 16 '24 15:07 mergify[bot]

sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution
sudo: unable to resolve host asan-debian12-amd64-ci30: Temporary failure in name resolution

and then it's downhill from there ... trying to rerun just this failing test to see if we can get past this

riw777 avatar Jul 24 '24 00:07 riw777

ci:rerun

riw777 avatar Aug 27 '24 12:08 riw777

This needs to be rebased and we need to get these CI failures cleared ...

riw777 avatar Sep 24 '24 13:09 riw777