interfaces: let rc.linkup coalesce interface events from devd
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
- [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue
Is your feature request related to a problem? Please describe.
This refs #6852 and the problem with rc.linkup being driven by devd is that reloads are done one interface at a time which might be too much overhead with many VLANs attached to a physical port going down and back up. Some reload hooks are executed each time but not really needed.
Describe the solution you like
Push devices into a fifo queue and reload all interface, but only after finishing all interfaces do the reload hooks of interface_configure($reload = true)
Describe alternatives you considered
There are no real alternatives. The boot is pretty well serialized but later on that is hard to ensure without restructuring all of the configuration code to check for changes in the system state and render them. Having many interfaces to check slows this down considerably again.
Additional context
#6852 and possibly another rc.linkup bug that @swhite2 mentioned today.
The issue lies with link-up events being triggered rather late in the boot sequence (which might be caused by anything, in my case slow-to-come-up SFP modules). If we consider a DHCP interface such as the factory default WAN, the following happens before a link:
/usr/local/etc/rc.bootup: The command '/sbin/dhclient -c '/var/etc/dhclient_wan.conf' -p '/var/run/dhclient.ax0.pid' 'ax0''
returned exit code '1', the output was 'ax0: no link .............. giving up'
The devd event is ignored during boot (by exit_on_bootup()), as it likely should to avoid other problems. The result is that the WAN interface will never obtain an IP. Ideally, dhclient should simply wait for a link to come up (which is the case on Linux AFAIK) and not bail at all, dhclient on FreeBSD does not support this I believe.
This may just as well be a driver issue, as it's quite uncommon that a link should come up this late, though wholly assuming there is a link is in my opinion a bit risky.
I just battled what I think is relevant problem for this issue. It started late last year after an upgrade to one of the later 23.x versions (still present in 24.1). I spent alot of time learning how OPNsense boots and debugging the boot process on our slow Atom D510 4MB RAM hardware.
It's long and drawn out but I thought I would write it up in case anyone finds it helpful or I need to refer back to it myself:)
What @fichtner is trying to fix and @swhite2 is saying about late link-up events is exactly what I found the problem to be.
Problem Description
For many years we have successfully run a redundant pair of OPNsense firewalls with 4 NICs, ~8 VLANs, some with CARP, some with FRR/ospfd. After upgrade to a later 23.x versions bootup would "complete" and FRR/ospfd would converge with all OSPF routes added to the kernel and working as expected. Within 2-3 minutes, some or all OSPF learned routes for an OSPF interface (or >1 interface sometimes) would be removed from the kernel and never re-added. I upgraded to the latest 24.x version and the issue remained.
Problem Details
Turns out that interface_configure() ends up doing an address flush which executes ifconfig <intf> <ip> -alias. The ip gets re-added at some point, but the kernel removed the OSPF route on the IP removal. FRR/ospfd never drops the adjacency and re-neighbors so the kernel is never re-informed to add the route.
I use rc.syshook.d/early/ to enable zebra debugging during boot and I can see FRR/zebra does receive notifications from the kernel about the address/route removal, but FRR does nothing about it. As far as FRR is concerned the kernel has the route zebra gave it and ospfd continues to maintain adjacency. Toggling the interface at either end of the link will cause OSPF to re-neighbor and the route(s) get re-added to the kernel.
Ultimately I found (at least on our ancient hardware/config) that rc.bootup finishes way before all the devd LINKUP/DOWN events are complete, so the flock on /var/run/booting is released well before all the devd linkup/down events fire off. During "booting", the call to interface_configure() is skipped in rc.linkup by exit_on_bootup(). After booting is considered complete, this interface_configure() call is not skipped, so the IP address flush in interface_configure() is executed which kills the kernel route(s) zebra added for that interface.
"My Workaround/Fix"
I added a long delay to the end of rc.bootup, before the final exit(0). This kept the "booting" state (flock on /var/run/booting) active for the duration of the artificial delay. With this delay place I can see in the system log file that devd events continue to be processed after bootup would normally have been "done". The interface_configure() calls get skipped, so there are no ifconfig <intf> <ip> -alias executions. The calls to plugin hooks for openvpn, ipsec, dhcp, dns,crl etc run very slowly. It takes 2-3 or more minutes to complete.... like @fitchner describes.
After my artificial delay is complete, rc.bootup exits, flock on /var/run/booting is released, normal boot continues (starting FRR etc) and everything works fine... because there are no more linkup/down devd events
For now I've left my rc.bootup extended delay in. It's a redundant firewall setup so we can live with a 3 minute longer boot.
Questions/Thoughts
So... I'm not an expert on the OPNsense/FreeBSD boot process or future goals for it but:
- If FRR/zebra acted on the kernel messages about address/route removal, OSPF adjacency would/could drop and be re-aquired, sending the route(s) to kernel again.
- If I upgraded to faster hardware, it may chew through the devd linkup/downs before
rc.bootupcompleted. - @fitchner's coalesce concept for this issue may speed things up so devd linkup/downs are processed before
rc.bootupis completed. - From some code and comments I ran into about older versions: There "may" have been special processing for interfaces with static IP (or just static ARP?) that did not remove the IP.... which would leave kernel route intact. That might explain why we did not see this issue until recently(later 23.x versions).
- Maybe adding yet another interface plugin for FRR that restarts the OSPF process (via
vtysh "clear ip ospf process"or full FRR/ospfd restart) could work.
Working and Not Working Log
I attached some logs with my added DEBUG logging that have been sanitized with IPs/names/etc replaced via sed.
Note that the ospfd already running? error in the logs is ok and is not involved in this issue. I noticed this at some point and found a @fichtner comment that it's by design to make 2nd start attempt later in startup process.
boot_with_rc.bootup_delay_added.txt:
-
rc.bootuphas added delay, allowing a working bootup. - There is only 1 call to
ifconfig <intf> <ip> -alias... for loopback/127.0.0.1 boot_with_rc.bootup_delay_added.txt
boot_without_rc.bootup_delay_added.txt:
- Normal bootup, no artificial delay added to
rc.bootup - devd linkup/downs occur after bootup is "complete"
-
ifconfig <intf> <ip> -aliascommands are invoked (which is what kills the kernel routes added by zebra) boot_without_rc.bootup_delay_added.txt
@framer99 thanks for the detailed report. just to be sure you are already on 24.x or still on 23.x?
Yes that's correct, 24.1.1.
I've discovered that during normal operation we can still lose kernel routes zebra installed when a link toggled when a neighbor switch rebooted
Doing a clean 5 second long cable unplugs/re-plugs generally work. However, during card reset/switch reboots for equipment plugged into the OPNsense machine, routes can still end up in zebra but not the kernel.
I turned bfd back on for one OSPF link and bfd itself seems to bounce alot (3,4, maybe more times) when recovering instead of just bfd down then bfd up.
I commented out the actual ifconfig <intf> <ip> -alias command in legacy_interface_deladdress() and things seem better.
I will need to create a simple single-link test setup to be able to get to the bottom of it all. Maybe its all our ancient hardware or some other part of the config.