packet errors on outgoing traffic
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
- [x] I have read the contributing guide lines at https://github.com/opnsense/src/blob/master/CONTRIBUTING.md
- [x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/src/issues?q=is%3Aissue
Describe the bug
On high load, there are errors out on VLAN interfaces. No error on parent interface. In my specific case suricata is enable. At suricata startup some errors are counting , less than 10 per interfaces. When I disable suricata there is no error on high load.
I found 2 issues similar, they were close due to new version of OPNSense : #87 #74 (suricata is not in use for this case)
To Reproduce
Steps to reproduce the behavior:
- Configure VLAN on interface igb1
- Make some high load traffic on the interface (speed test, torrent download)
- See error out on interface counter
Expected behavior
Error count should be at 0
Describe alternatives you considered
I start with OPNSense 22.1.2 in VM with virtio and e1000 drivers. I though it was an issue with this specific setup. Then I received DEC850 appliance and the result is the same. There are errors out on VLAN interfaces.
Screenshots
If applicable, add screenshots to help explain your problem.

Environment
Current setup :
- Appliance DEC850
- OPSense Business edition 21.10.3
- tested igb and ax interfaces
Test setup :
- VM on proxmox 7
- OPNSense community 22.1.2
- tested virtio and e1000 interfaces
Earlier investigation revealed that timing during initialization of a vlan interface was a bit off, therefore packets were sent out over a vlan interface that wasn't up yet, revealing some packet errors. These errors didn't have any impact, but also limited itself to about 3-10 errors maximum during interface bootup.
Seeing so many errors suggests problems elsewhere in the stack - but somehow related to the vlan transmit routine. I'll take a closer look.
Let me emphasize though that these errors are negligible and probably have no real impact.
@fabricemrchl I'm unable to reproduce the issue on my setup using suricata + high load traffic. Would you be able to share some debug output? The following would be helpful:
-
netstat -s - hardware-specific counters using
sysctl -a | grep <device name, e.g. igb>
@fabricemrchl Thanks for the output. Do you have an estimate of the amount of outbound errors at the time of the recording of this output? I'm wondering how it scales in ratio since the amount of packets are a lot more than the original screenshot.
Around 2100Go in and 550Go out when I recorded previous output.
Ratio is low, during the last 10days I did not have much traffic that generate error.

I restarded OPNSense and I started a torrent download through Wireguard VPN (VPN and torrent client on computer not on router). You can find bellow stats and debug output. Ratio is much higher. Error increase quickly with this kind of traffic so I use it to test and reproduce, but it can occur with other kind of traffic.
sysctl2.txt
netstat2.txt

While I can't draw any conclusions from the data here, I noticed the VLAN virtual interface is very sensitive to output errors in it's transmit routine. Most notably it reports errors when:
- The parent interface is not up and running, I have been able to reproduce this by restarting a parent interface while packets are going through.
- The kernel isn't able to prepend a valid 802.1Q header.
- The parent interfaces' transmit routine fails for any reason (e.g. no buffers available).
Especially the last point is something that seems unique to VLAN virtual interfaces. Normally interfaces do not report this as an outbound interface error as far as I can see.
Other things you can try:
- add an entry in system->settings->tunables:
net.link.vlan.soft_padto1. - Toggling VLAN hardware filtering (either globally or per interface).
If you're able, the output of dmesg would also be very helpful.
I set net.link.vlan.soft_pad to 1 , no change errors still counting (router restarded after adding this setting)
Currently VLAN hardware filtering is disable on my setup. I read almost everywhere that Suricata in IPS mode require to disable this feature. Is it safe to enable it with Suricata IPS mode and promiscious mode enable ?
Please find dmesg and others debug output associated : debug_systclt3.txt debug_netstat3.txt debug_dmesg.txt In/out packets 3730493 / 11398747 (1009.10 MB / 14.95 GB) In/out packets (pass) 3730095 / 11398747 (1009.07 MB / 14.95 GB) In/out packets (block) 34711 / 0 (398 bytes / 0 bytes) In/out errors 0 / 21654
Which interface is running Suricata (IPS)? As is indicated in the GUI and the docs, IPS shouldn't be run directly on VLAN interfaces, only on it's parent interface. And yes, in IPS mode VLAN hardware filtering should be disabled. You could try switching to IDS mode and toggling VLAN hardware filtering to see if this changes anything to rule out IPS being the culprit.
At suricata startup some errors are counting , less than 10 per interfaces
These errors are related to the interface startup and can be safely ignored.
I'm noticing that the netmap setup is ignoring the RX and TX descriptor overrides and is reverting back to 4 CPUs in your dmesg output (though unclear which interface this relates to), could you try setting net.inet.rss.enabled to 0 in the tunables section and reboot the system? You can also force traffic flow to the driver over 1 CPU by setting dev.ax.0.rss_enabled to 0 as a tunable.
Since you're running on ax0, output from sysctl dev.ax.0 can also be helpful to rule out specific TX errors in the hardware.
Parent interface, named MANAGEMENT or SERVER in the last debug output, is running Suricata(IPS). I tried Suritica in IDS mode, also on parent interface, no error with this mode. I believe that the IDS mode does not use NETMAP, so either the problem comes from Suricata or from NETMAP. I didn't enable VLAN hardware filtering. Switching from IPS to IDS with VLAN hardware filtering disable is enough to stop error.
I tried Suricata IPS mode with net.inet.rss.enabled and dev.ax.0.rss_enabled set to 0, error still counting, not better not worse.
sysctl dev.ax.0 result :
debug_systclt4.txt
I will have a look to Suricata bug report to check if there is a known issue. Maybe I need to tune some Suricata settings to fix those errors, If you have some idea on how to tune it, I can test it. Suricata running config dump : suricata_config.txt
I've been able to reproduce the errors on my end with IPS on the parent interface. It seems Netmap is the culprit here somewhere since I've built a custom kernel removing https://github.com/opnsense/src/blob/stable/22.1/sys/net/if_vlan.c#L1260, and observing that the outbound errors remain at 0.
Why the transmit function fails is still unclear to me, but Suricata isn't the issue here.
Hello, Do you have some news about this issue? Can I help you in anyway?
Hi,
This issue is still very much on my to-do list and I hope I can get back to you by the end of the week.
Hi @fabricemrchl,
Apologies for the later-than-expected reply, but it took some time to configure a working tracing setup due to regressions in the FreeBSD13-STABLE kernel. In any case, here is a preliminary result:
(running an iperf3 network test for ~5 seconds, OPNsense as a client to generate a lot of outbound traffic)
dtrace -n 'fbt::vlan_transmit:return { @ = lquantize((int32_t)arg1, 0, 100, 1); }' - tracing the return codes of the vlan_transmit function.
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 295498
1 | 0
2 | 0
3 | 0
.
.
.
55 | 5
56 | 0
In an error situation, the return code is 55.
According to https://www.freebsd.org/cgi/man.cgi?query=errno&sektion=2&manpath=freebsd-release-ports:
55 ENOBUFS No buffer space available. An operation on a socket or pipe
was not performed because the system lacked sufficient buffer
space or because a queue was full.
~~Again, normally when this happens, interfaces don't report these as outbound errors.~~ It seems most virtual interfaces actually do report these as outbound errors.
@fabricemrchl Update:
Since Suricata in it's current state only uses one thread to pass packets up to the host stack, it's easy to imagine buffers being exhausted, as Suricata is probably processing packets faster than the host stack can receive/transmit them.
When running the Suricata 6.0.5-devel package (part of the OPNsense development firmware), the opposite is true - multiple threads are used to tackle this issue, increasing throughput by about ~1.5 Gbit/s, while simultaneously eliminating the outbound errors.
I have found no workaround for the outbound errors in the current state, it causes some congestion at most when operating at line speed. It is very unlikely a system is fully satured all of the time, so these errors are spurious. Until the Suricata package is in a working stable state, there isn't much I can do (I have experimented with netmap and system tunables). You are of course free to switch to the development version should your setup allow for such a thing :)
Hi @swhite2 , Thank you for your debug and information. No problem for the workaround, if the issue is fixed in the next Suricata release it will be perfect. Currently I'm running OPNSense 22.4 version. If there is an easy path to test Suricata 6.0.5-devel and revert to 6.0.4_1 I can test it. I have only one router so I can't switch to devel branch. If I can upgrade only Suricata package it's OK to test on my side. If not I will wait for an non-devel release.
Unfortunately no, the easiest way to switch is to switch to the OPNsense-devel package entirely. This replaces the core package as well. The only other way to isolate it is to build suricata-devel from source, deinstalling the current version (without using pkg) and installing the newly built one. If you'd like to try this I can provide instructions for it, but be aware it's not ready for a production environment.
It is unclear when the Suricata package with the netmap changes is ready for release. There are still known bugs causing potential lockups.
No problem, I will wait for production ready package.
I tried to solve this exact same problem for days since we got a 1Gbit/s internet connection. We are having around 60.000 errors per 100GB, and thus sometimes failed downloads etc..
I solved this by shaping the WAN speed to 500MBit/s which does stop interface errors. Not a great solution but it works.
"On high load, there are errors out on VLAN interfaces. No error on parent interface."
I have this same issue on VLAN interface on my WAN. My ISP uses PPPoE over VLAN. I sometimes randomly get a lot of errors out on VLAN 6 on WAN interface, and that basically drops my internet for a minute, and it needs to be re-negotiated then. Of course, this is very annoying. I am not even using IDS/IDP, just a basic setup. Any way I can troubleshoot this?
Thanks
Is it the errors dropping your connection or is there any other form of flapping going on (which in turn would cause outbound errors to accumulate )? Maybe check the dmesg output for recurring linkup/linkdown messages, as well as the general system log.
I can only correlate it to errors. I see the PPPoE connection drop because there's no ECHO reply. And this coincides with the sudden accumulation of errors on the WAN VLAN 6.
Suricata 6.0.9 have implemented new netmap API (https://redmine.openinfosecfoundation.org/versions/184) and this version is built in OpnSense 22.7.9 Is that mean it will improve this issue or do you need to implement something else on Opnsense side ?
I'm currently on business edition so I can't test it now.
Nothing will change for our suricata 6. you can test the newer netmap changes since at least half a year, but I doubt it does any magic here.
I misunderstood something. I though the Suricata devel package mentioned by swhite2 here was about new netmap API. So any ETA about Suricata devel package with multi thread support ?
No ETA, it’s been there for a long time as I said.
The new netmap api is already being used in 23.7.