lib: take into account the Linux IFF_LOWER_UP flag
In Linux, a network driver can set the interface flags IFF_UP and IFF_RUNNING although the IFF_LOWER_UP flag is down, which means the interface is ready but the carrier is down:
These values contain interface state:
ifinfomsg::if_flags & IFF_UP: Interface is admin up ifinfomsg::if_flags & IFF_RUNNING: Interface is in RFC2863 operational state UP or UNKNOWN. This is for backward compatibility, routing daemons, dhcp clients can use this flag to determine whether they should use the interface. ifinfomsg::if_flags & IFF_LOWER_UP: Driver has signaled netif_carrier_on()
However, FRR considers an interface is operational as soon it is up (IFF_UP) and running (IFF_RUNNING), disregarding the IFF_LOWER_UP flag. This can lead to a scenario where FRR starts adding routes through an interface that is technically down at the carrier level, resulting in kernel errors.
Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=243, pid=3112881162 Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (318[if 164]) into the kernel Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=245, pid=3112881162 Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Nexthop id does not exist Jan 02 18:07:18 dut-vm zebra[283731]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=246, pid=3112881162 Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (320[10.125.0.2 if 164]) into the kernel Jan 02 18:07:18 dut-vm zebra[283731]: [VYKYC-709DP] default(0:254):0.0.0.0/0: Route install failed
Consider an interface is operational when it has the IFF_UP, IFF_RUNNING and IFF_LOWER_UP flags.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/operstates.rst?h=v6.7-rc8#n29 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/nexthop.c?h=v6.7-rc8#n2886 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/netdevice.h?h=v6.7-rc8#n4198
Why do we rely on our own copy of the linux headers instead of compiling with the headers of the local machine ?
we rely on our own headers because the local machine may not have a new enough of a header. I can write a feature against the linux netlink stack that is available on 6.X but if I only have 5.
This way FRR can compile against newer versions of the linux abi and when that abi is not available due to it being an older kernel the call will just fail.
This also allows FRR to compile into a package and be deployed against a variety of kernels
@donaldsharp could you detail what you expect for the zebra/if_netlink.c rework
I cannot get rid of net/if.h there
Removal of if.h include
$ git diff
diff --git a/zebra/if_netlink.c b/zebra/if_netlink.c
index dac2c0a33a..0b5eb6a779 100644
--- a/zebra/if_netlink.c
+++ b/zebra/if_netlink.c
@@ -17,7 +17,6 @@
#define _LINUX_IF_H
#define _LINUX_IP_H
-#include "if.h"
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <netinet/if_ether.h>
$ make -j12
true
make all-am
make[1]: Entering directory '/home/lscalber/git/frr'
CC zebra/if_netlink.o
In file included from zebra/if_netlink.c:25:
./include/linux/if_tunnel.h:30:14: error: ‘IFNAMSIZ’ undeclared here (not in a function)
30 | char name[IFNAMSIZ];
| ^~~~~~~~
cc1: note: unrecognized command-line option ‘-Wno-microsoft-anon-tag’ may have been intended to silence earlier diagnostics
make[1]: *** [Makefile:10616: zebra/if_netlink.o] Error 1
make[1]: Leaving directory '/home/lscalber/git/frr'
make: *** [Makefile:6478: all] Error 2
Re-add linux if.h
$ git diff
diff --git a/zebra/if_netlink.c b/zebra/if_netlink.c
index dac2c0a33a..29b85056cd 100644
--- a/zebra/if_netlink.c
+++ b/zebra/if_netlink.c
@@ -14,10 +14,8 @@
* Reference - https://sourceware.org/ml/libc-alpha/2013-01/msg00599.html
*/
#define _LINUX_IN6_H
-#define _LINUX_IF_H
#define _LINUX_IP_H
-#include "if.h"
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <netinet/if_ether.h>
$ make -j12
true
make all-am
make[1]: Entering directory '/home/lscalber/git/frr'
CC zebra/if_netlink.o
In file included from ./include/linux/if_tunnel.h:5,
from zebra/if_netlink.c:24:
./include/linux/if.h:112:19: error: redeclaration of enumerator ‘IFF_UP’
112 | #define IFF_UP IFF_UP
| ^~~~~~
./include/linux/if.h:85:2: note: previous definition of ‘IFF_UP’ was here
85 | IFF_UP = 1<<0, /* sysfs */
| ^~~~~~
./include/linux/if.h:113:25: error: redeclaration of enumerator ‘IFF_BROADCAST’
113 | #define IFF_BROADCAST IFF_BROADCAST
| ^~~~~~~~~~~~~
./include/linux/if.h:86:2: note: previous definition of ‘IFF_BROADCAST’ was here
86 | IFF_BROADCAST = 1<<1, /* volatile */
| ^~~~~~~~~~~~~
./include/linux/if.h:114:21: error: redeclaration of enumerator ‘IFF_DEBUG’
114 | #define IFF_DEBUG IFF_DEBUG
| ^~~~~~~~~
./include/linux/if.h:87:2: note: previous definition of ‘IFF_DEBUG’ was here
87 | IFF_DEBUG = 1<<2, /* sysfs */
| ^~~~~~~~~
./include/linux/if.h:115:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’
115 | #define IFF_LOOPBACK IFF_LOOPBACK
| ^~~~~~~~~~~~
./include/linux/if.h:88:2: note: previous definition of ‘IFF_LOOPBACK’ was here
88 | IFF_LOOPBACK = 1<<3, /* volatile */
| ^~~~~~~~~~~~
./include/linux/if.h:116:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’
116 | #define IFF_POINTOPOINT IFF_POINTOPOINT
| ^~~~~~~~~~~~~~~
./include/linux/if.h:89:2: note: previous definition of ‘IFF_POINTOPOINT’ was here
89 | IFF_POINTOPOINT = 1<<4, /* volatile */
| ^~~~~~~~~~~~~~~
./include/linux/if.h:117:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’
117 | #define IFF_NOTRAILERS IFF_NOTRAILERS
| ^~~~~~~~~~~~~~
./include/linux/if.h:90:2: note: previous definition of ‘IFF_NOTRAILERS’ was here
90 | IFF_NOTRAILERS = 1<<5, /* sysfs */
| ^~~~~~~~~~~~~~
./include/linux/if.h:118:23: error: redeclaration of enumerator ‘IFF_RUNNING’
118 | #define IFF_RUNNING IFF_RUNNING
| ^~~~~~~~~~~
./include/linux/if.h:91:2: note: previous definition of ‘IFF_RUNNING’ was here
91 | IFF_RUNNING = 1<<6, /* volatile */
| ^~~~~~~~~~~
./include/linux/if.h:119:21: error: redeclaration of enumerator ‘IFF_NOARP’
119 | #define IFF_NOARP IFF_NOARP
| ^~~~~~~~~
./include/linux/if.h:92:2: note: previous definition of ‘IFF_NOARP’ was here
92 | IFF_NOARP = 1<<7, /* sysfs */
| ^~~~~~~~~
./include/linux/if.h:120:23: error: redeclaration of enumerator ‘IFF_PROMISC’
120 | #define IFF_PROMISC IFF_PROMISC
| ^~~~~~~~~~~
./include/linux/if.h:93:2: note: previous definition of ‘IFF_PROMISC’ was here
93 | IFF_PROMISC = 1<<8, /* sysfs */
| ^~~~~~~~~~~
./include/linux/if.h:121:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’
121 | #define IFF_ALLMULTI IFF_ALLMULTI
| ^~~~~~~~~~~~
./include/linux/if.h:94:2: note: previous definition of ‘IFF_ALLMULTI’ was here
94 | IFF_ALLMULTI = 1<<9, /* sysfs */
| ^~~~~~~~~~~~
./include/linux/if.h:122:22: error: redeclaration of enumerator ‘IFF_MASTER’
122 | #define IFF_MASTER IFF_MASTER
| ^~~~~~~~~~
./include/linux/if.h:95:2: note: previous definition of ‘IFF_MASTER’ was here
95 | IFF_MASTER = 1<<10, /* volatile */
| ^~~~~~~~~~
./include/linux/if.h:123:21: error: redeclaration of enumerator ‘IFF_SLAVE’
123 | #define IFF_SLAVE IFF_SLAVE
| ^~~~~~~~~
./include/linux/if.h:96:2: note: previous definition of ‘IFF_SLAVE’ was here
96 | IFF_SLAVE = 1<<11, /* volatile */
| ^~~~~~~~~
./include/linux/if.h:124:25: error: redeclaration of enumerator ‘IFF_MULTICAST’
124 | #define IFF_MULTICAST IFF_MULTICAST
| ^~~~~~~~~~~~~
./include/linux/if.h:97:2: note: previous definition of ‘IFF_MULTICAST’ was here
97 | IFF_MULTICAST = 1<<12, /* sysfs */
| ^~~~~~~~~~~~~
./include/linux/if.h:125:23: error: redeclaration of enumerator ‘IFF_PORTSEL’
125 | #define IFF_PORTSEL IFF_PORTSEL
| ^~~~~~~~~~~
./include/linux/if.h:98:2: note: previous definition of ‘IFF_PORTSEL’ was here
98 | IFF_PORTSEL = 1<<13, /* sysfs */
| ^~~~~~~~~~~
./include/linux/if.h:126:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’
126 | #define IFF_AUTOMEDIA IFF_AUTOMEDIA
| ^~~~~~~~~~~~~
./include/linux/if.h:99:2: note: previous definition of ‘IFF_AUTOMEDIA’ was here
99 | IFF_AUTOMEDIA = 1<<14, /* sysfs */
| ^~~~~~~~~~~~~
./include/linux/if.h:127:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’
127 | #define IFF_DYNAMIC IFF_DYNAMIC
| ^~~~~~~~~~~
./include/linux/if.h:100:2: note: previous definition of ‘IFF_DYNAMIC’ was here
100 | IFF_DYNAMIC = 1<<15, /* sysfs */
| ^~~~~~~~~~~
In file included from ./lib/if.h:9,
from ./lib/sockunion.h:13,
from ./lib/prefix.h:10,
from zebra/if_netlink.c:31:
/usr/include/net/if.h:111:8: error: redefinition of ‘struct ifmap’
111 | struct ifmap
| ^~~~~
In file included from ./include/linux/if_tunnel.h:5,
from zebra/if_netlink.c:24:
./include/linux/if.h:195:8: note: originally defined here
195 | struct ifmap {
| ^~~~~
In file included from ./lib/if.h:9,
from ./lib/sockunion.h:13,
from ./lib/prefix.h:10,
from zebra/if_netlink.c:31:
/usr/include/net/if.h:126:8: error: redefinition of ‘struct ifreq’
126 | struct ifreq
| ^~~~~
In file included from ./include/linux/if_tunnel.h:5,
from zebra/if_netlink.c:24:
./include/linux/if.h:232:8: note: originally defined here
232 | struct ifreq {
| ^~~~~
In file included from ./lib/if.h:9,
from ./lib/sockunion.h:13,
from ./lib/prefix.h:10,
from zebra/if_netlink.c:31:
/usr/include/net/if.h:176:8: error: redefinition of ‘struct ifconf’
176 | struct ifconf
| ^~~~~~
In file included from ./include/linux/if_tunnel.h:5,
from zebra/if_netlink.c:24:
./include/linux/if.h:284:8: note: originally defined here
284 | struct ifconf {
| ^~~~~~
cc1: note: unrecognized command-line option ‘-Wno-microsoft-anon-tag’ may have been intended to silence earlier diagnostics
make[1]: *** [Makefile:10616: zebra/if_netlink.o] Error 1
make[1]: Leaving directory '/home/lscalber/git/frr'
make: *** [Makefile:6478: all] Error 2
Force not including net/if.h
$ git diff
diff --git a/zebra/if_netlink.c b/zebra/if_netlink.c
index dac2c0a33a..8e06c0c02a 100644
--- a/zebra/if_netlink.c
+++ b/zebra/if_netlink.c
@@ -14,10 +14,9 @@
* Reference - https://sourceware.org/ml/libc-alpha/2013-01/msg00599.html
*/
#define _LINUX_IN6_H
-#define _LINUX_IF_H
#define _LINUX_IP_H
+#define _NET_IF_H
-#include "if.h"
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <netinet/if_ether.h>
$ make -j12
true
make all-am
make[1]: Entering directory '/home/lscalber/git/frr'
CC zebra/if_netlink.o
In file included from ./include/linux/if.h:37,
from ./include/linux/if_tunnel.h:5,
from zebra/if_netlink.c:25:
/usr/include/linux/hdlc/ioctl.h:74:14: error: ‘IFNAMSIZ’ undeclared here (not in a function); did you mean ‘ALTIFNAMSIZ’?
74 | char master[IFNAMSIZ]; /* Name of master FRAD device */
| ^~~~~~~~
| ALTIFNAMSIZ
zebra/if_netlink.c: In function ‘get_iflink_speed’:
zebra/if_netlink.c:258:15: error: storage size of ‘ifdata’ isn’t known
258 | struct ifreq ifdata;
| ^~~~~~
zebra/if_netlink.c:258:15: error: unused variable ‘ifdata’ [-Werror=unused-variable]
zebra/if_netlink.c: At top level:
cc1: note: unrecognized command-line option ‘-Wno-microsoft-anon-tag’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make[1]: *** [Makefile:10616: zebra/if_netlink.o] Error 1
make[1]: Leaving directory '/home/lscalber/git/frr'
make: *** [Makefile:6478: all] Error 2
@louis-6wind -> I spent another couple of hours or so over the last couple of days trying to get the results I wanted with this PR and ran into a wall on NetBSD. I thought it would be far easier than it was and frankly this has turned into far far more work than I thought it would be. Feel free to remove all the code around trying to push the header inclusion into lib/if.h. I would still make this 2 commits though, the new headers and then the fix to IFF_LOWER_UP
@idryzhov could you merge this please ?
CI:rerun Rerun after fixing git access on CI infra
CI:rerun Rerun after fixing git access on CI infra
@louis-6wind it looks like there are actual build errors right now. Please, take a look.
The tentative answer during the call on tuesday is that this was done to work around kernel bugs with specific interface drivers. If this is the reason, I'm opposed to this change. My reasoning is quite simple:
The change is apparently being made because some buggy drivers set IFF_RUNNING incorrectly when the link is down, and IFF_LOWER_UP would work correctly. If we change the code to check IFF_LOWER_UP, these buggy drivers would work. Except now we would have the reverse problem if there are buggy drivers that set IFF_RUNNING correctly but don't set IFF_LOWER_UP. We would be choosing "which bug to support". What happens when someone comes in a few weeks and changes it back because another driver has this reverse bug?
We should code to the API as designed, not to some specific driver bugs. Checking for IFF_RUNNING is the correct thing, no?
Can we have a list (interface types) of bug'd kernel drivers ? Could it be limited to those interfaces until the kernel's ones will be fixed ?
Can we have a list (interface types) of bug'd kernel drivers ? Could it be limited to those interfaces until the kernel's ones will be fixed ?
I only have the info in this PR… @louis-6wind can you provide the context/info here?
The tentative answer during the call on tuesday is that this was done to work around kernel bugs with specific interface drivers.
On Tuesday, I mentioned a problem with a specific driver, not a kernel bug. The issue was identified with the tun/tap driver - I couldn't remember which one was having problems.
At initialization, the module reports IFF_RUNNING & !IFF_LOWER_UP then IFF_RUNNING & IFF_LOWER_UP.
If this is the reason, I'm opposed to this change. My reasoning is quite simple:
The change is apparently being made because some buggy drivers set
IFF_RUNNINGincorrectly when the link is down, andIFF_LOWER_UPwould work correctly. If we change the code to checkIFF_LOWER_UP, these buggy drivers would work. Except now we would have the reverse problem if there are buggy drivers that setIFF_RUNNINGcorrectly but don't setIFF_LOWER_UP. We would be choosing "which bug to support". What happens when someone comes in a few weeks and changes it back because another driver has this reverse bug?
According to the pull-request description, Zebra is unable to install the nexthop, as indicated by the logs
Jan 02 18:07:18 dut-vm zebra[283731]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (318[if 164]) into the kernel
Jan 02 18:07:18 dut-vm zebra[283731]: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down
This "extended error" originates from the kernel, because !netif_carrier_ok(cfg->dev) returns true, indicating that the network interface carrier is down. See: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/nexthop.c#n3127
Furthermore, the kernel itself sets IFF_LOWER_UP if netif_carrier_ok(cfg->dev) is true. See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v6.8#n8585
It's important to note that this flag setting is handled by the kernel, not by the driver directly.
We should code to the API as designed, not to some specific driver bugs. Checking for
IFF_RUNNINGis the correct thing, no?
Seems to be related https://github.com/FRRouting/frr/issues/10199
ci:rerun
The issue was identified with the tun/tap driver […] At initialization, the module reports
IFF_RUNNING & !IFF_LOWER_UPthenIFF_RUNNING & IFF_LOWER_UP.
I don't see this behavior, what exactly do you need to do to get a tun/tap device in IFF_RUNNING & !IFF_LOWER_UP?
Furthermore, the kernel itself sets
IFF_LOWER_UPifnetif_carrier_ok(cfg->dev)is true. See: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v6.8#n8585
Yeah, but IFF_RUNNING ⇒ netif_oper_up ⇒ operstate == IF_OPER_{UP,UNKNOWN} ⇒ link_watch code
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/link_watch.c#n36
so IF_OPER will not be set if !netif_carrier_ok …
I'd really like to see where in the kernel it is possible to get IFF_RUNNING & !IFF_LOWER_UP
The issue was identified with the tun/tap driver […] At initialization, the module reports
IFF_RUNNING & !IFF_LOWER_UPthenIFF_RUNNING & IFF_LOWER_UP.I don't see this behavior, what exactly do you need to do to get a tun/tap device in
IFF_RUNNING & !IFF_LOWER_UP?
I am not able to extract the exact conditions, but here is what I observe with 'ip -d monitor link'
8: ntfp2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
tun type tap pi off vnet_hdr on multi_queue numqueues 2 numdisabled 0 persist off user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
8: ntfp2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UNKNOWN group default
link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
tun type tap pi off vnet_hdr on multi_queue numqueues 1 numdisabled 1 persist on user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
8: ntfp2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state UNKNOWN group default
link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
tun type tap pi off vnet_hdr on multi_queue numqueues 1 numdisabled 1 persist on user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
8: ntfp2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default
link/ether de:ed:02:44:4c:55 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65521
tun type tap pi off vnet_hdr on multi_queue numqueues 1 numdisabled 1 persist on user root group 1000001 numtxqueues 256 numrxqueues 256 gso_max_size 65536 gso_max_segs 65535
Furthermore, the kernel itself sets
IFF_LOWER_UPifnetif_carrier_ok(cfg->dev)is true. See: git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v6.8#n8585Yeah, but
IFF_RUNNING⇒netif_oper_up⇒operstate == IF_OPER_{UP,UNKNOWN}⇒ link_watch code https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/link_watch.c#n36 soIF_OPERwill not be set if!netif_carrier_ok…
You are saying that IFF_RUNNING means IF_OPER_UP or IF_OPER_UNKNOWN. That is correct. If you have IF_OPER_UP, you will also have netif_carrier_ok. That's correct.
That doesn't contradict what I'm saying. If the state is IF_OPER_UNKNOWN, the driver is not usable.
I'd really like to see where in the kernel it is possible to get
IFF_RUNNING & !IFF_LOWER_UP
Logs are saying it is possible and it conforms with the specifications.
To summarize, nexthop objects can only be installed on interfaces marked as IFF_LOWER_UP. The IFF_LOWER_UP flag indicates that the operational state of the interface is "up." On the other hand, IFF_RUNNING indicates that the operational state is either "up" or "unknown."
Zebra currently treats an interface as operational and allows the creation of nexthop objects as soon as the IFF_RUNNING flag is set, regardless of the IFF_LOWER_UP status. However, this approach can lead to issues if the interface's operational state is "unknown," causing nexthop creation to fail. It would be more appropriate for Zebra to initiate nexthop creation only when IFF_LOWER_UP is set.
It is worth noting that the IFF_LOWER_UP flag is activated by the kernel itself when the driver invokes netif_carrier_on(), reliably indicating that the interface is operational.
Ok, I think it should work. I hope we won't have the opposite problem with another driver though. If that happens we should just revert this.
@eqvinox could you approve and merge this pull request?
"verify source" is triggering errors on imported linux headers. They should not be modified. frrbot error is unrelevant
@eqvinox please approve and merge!