core Wireguard point-to-site handshake broken due to route not added in routing table

[x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
[x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Describe the bug

When routing traffic through OPNsense connecting from LAN side and sending all requests upstream to a VPN provider, there is no connectivity either through IP ping requests or DNS resolution. After checking the Wireguard diagnostics at /ui/wireguard/diagnostics, there is no handshake data being received.

I have had the same point to site setup since the 23 series and it has been working fine through 24.1. Around 24.1.2 I started to notice issues with the Wireguard handshake not being completed. I tried to update to 24.1.3 and 24.1.4 and ended up reverting to 24.1.2 which I believe is where the issue started. I can't recall, but I don't believe I was able to revert to 24.1. After fiddling with DNS and the release in 24.1.2 I was able to get some connectivity and stayed on it for awhile while attempting to update to subsequent versions. Yesterday/Today I have updated to 24.1.5 and the hotfix releases, currently on 24.1.5_3.

It seems that there is a route 128.0.0.0/1 on the wireguard interface (e.g. wg0) that is added to the routing table on reboot or restart of the Wireguard service. Deleting this route via /ui/diagnostics/interface/routes or via shell route del -net 128.0.0.0/1 allows the handshake to be completed and traffic to flow to the VPN provider.

To Reproduce

Set up OPNsense router as the client.
Configure the peer as the site to route all traffic to via 0.0.0.0/0.
Enable Wireguard.
Visit /ui/wireguard/diagnostics in the UI and notice the handshake tries to send, but no response is received.
(Optional) Try to access internet. Ssh to router and try to ping VPN endpoint or another public website. No IP connectivity and no DNS resolution.
Visit /ui/diagnostics/interface/routes in the UI and delete route 128.0.0.0/1.
Visit /ui/wireguard/diagnostics and notice that the handshake has now completed with internet connectivity restored.

Expected behavior

Starting or restarting the Wireguard service via the UI (or enabled at boot) with a properly configured point to site setup should complete the handshake and allow routing through the VPN. From my past use and understanding of Wireguard, the default route for all traffic should be 0.0.0.0/0 or as seen in route table of OPNsense 0.0.0.0/1 if one is trying to route all traffic out of the interface. This is true when starting the Wireguard service on OPNsense. However, I don't believe the route 128.0.0.0/1 should be set as I haven't seen that before and seems to be the cause of connection issues.

Relevant log files

No visible behavior in log files other than the Wireguard diagnostic screen not showing a completed handshake. The firewall log shows the request received and the outbound NAT rule applied on first connection.

Additional context

I am not an expert on routing, but the upstream urls are all above the 128.0.0.0/1 route as in 193.65.*.* and the like. I am not sure if relevant but am not able to test with IP addresses below 128.0.0.0/1. Not sure if that affects matching behavior as it is more specific that the default route and both routes are bound to the Wireguard interface (e.g. wg0).

Environment

OPNsense 24.1.5_3 (amd64)

Apr 06 '24 04:04 kkcarlc

Following up with some more information on this after reading up about the route I thought was suspect 128.0.0.0/1. I understand now this is to split the ipv4 address space in half to give the routes out of the wireguard interface more specificity without touching the default route.

I've tested behavior on two endpoints, one in the lower half of the ipv4 address space (0.0.0.0/1 > endpoint ip addr < 128.0.0.0/1) and one in the upper half (> 128.0.0.0/1). The route that is causing the handshake to fail depends on which half the upstream endpoint is in. The handshake will not complete if the endpoint is in the same half address space as the route.

E.g. endpoint is 45.0.0.2 the suspect route is 0.0.0.0/1. endpoint is 185.0.0.43 the suspect route is 128.0.0.0/1

To get a handshake and to restore full VPN functionality for all routes the steps need to be taken:

Start VPN service
If endpoint is in lower half, delete route 0.0.0.0/1. If endpoint is in upper half, delete route 128.0.0.0/1
Handshake will complete
- At this point only half of your traffic is going out the wireguard interface. The other half is going out the default route of the WAN
Add the route back that you deleted. route add -net 0.0.0.0/1 -iface wg0 or route add -net 128.0.0.0/1 -iface wg0
- Now all routes are going out of the wireguard interface again

Nothing changed in my configuration until this issue started popping up in release 24.1.2 to give a time range. I am hoping this can help diagnose the issue. Thank you for looking into it.

Apr 08 '24 07:04 kkcarlc

The problem seems to be that OPNsense is not configuring the endpoint route as documented in the Wireguard docs (emphasis mine):

The most straightforward technique is to just replace the default route, but add an explicit rule for the WireGuard endpoint

Looking in wg-service-control.php it seems that the aforementioned split routes of 0.0.0.0/1 and 128.0.0.0/1 are getting set according to the Wireguard docs linked above, in section Overriding The Default Route, but the explicit client endpoint route for initial handshake is not being set.

My workaround from last coment is not necessary if the explicit endpoint route is set via the wan interface. e.g. route add -net <client-endpoint>/32 -iface <wan>. I've tested this by restarting the wireguard service after the route was added and getting a handshake immediately.

I am not sure how the endpoint was being configured before, but I do believe this is a bug and not a support issue.

Apr 09 '24 07:04 kkcarlc

I believe I'm hitting a similar issue as I described it here and here.

Since the 23 series, I've been connecting to my OPNsense via Wireguard to access my network from outside and also tunnel my whole Internet browsing (routing 0.0.0.0/0) to protect and filter (blocklist in Unbound).

However, since 24.3 (or 24.4, can't recall), I started getting only a handshake (or at least a partial one), but nothing more than a couple of Kbs through and nothing more (no ping, no nothing).

As explained in the last link above, if I associate a new Wireguard instance to a newly created interface, my whole network falls, no traffic is going through anymore internal or external. If I delete the 0.0.0.0/1 route, it is restored (given that my internal network is 10.0.0.0/8). From the connected Wireguard clients (routing 0.0.0.0/0 though Wireguard), I can ping my internal network, I can ping Internet addresses (such as 1.1.1.1 or 8.8.8.8) but cannot do any DNS request (in the end, it may not be because of DNS, but because the IP is above 128.0.0.0/1, but cannot check it at the moment).

So I also believe that this is a bug and not a support question, or somewhere in the docs (I based myself on the "Road warrior guide"), it should be mentioned that something needs to be done to avoid to lose the whole network when creating/associating a Wireguard instance to a newly created interface.

Apr 15 '24 15:04 gsacre

Just for information, I updated to 24.1.6, created a new instance + interface and it's working fine, nothing required but what was explained in the "WireGuard Road Warrior Setup". This update made my day!

Apr 18 '24 17:04 gsacre

I am glad that it worked for you @gsacre.

I believe we are using Wireguard in two separate ways. Yours is a connection into your LAN, so your client (e.g. phone/computer) is configuring the client side setup correctly to complete the handshake. My use case is treating my router as the client (and all machines on LAN therefore as clients) on the network edge. I believe this is the issue with OPNsense, that the client configuration in this setup is not working correctly. The initial handshake is not being completed, unless I add an explicit route outside the tunnel to complete the handshake.

I believe the Wireguard docs linked above allude to my solution. This (possible) bug is still present in 24.1.6. Would love to have this confirmed by the OPNsense team.

Apr 24 '24 02:04 kkcarlc

I have isolated the commit that introduces the bug to dbe52eeaa.

I reverse patched src/opnsense/scripts/Wireguard/wg-service-control.php in order from e0cee10ad to 77fba066b:

e0cee10ad
dbe52eeaa
30862f871
c1d2d18a7
0d7d48eb1
77fba066b

I then re-applied each patch in order from 77fba066b one at a time, starting and stopping the Wireguard VPN interface through the UI in VPN->WireGuard->Instances using:

2024-05-11-201128_501x204_scrot .

Each time I would get a completed handshake as evidenced by the Received Column in VPN->WireGuard-Status:

2024-05-11-201320_377x142_scrot

However, when I applied patch dbe52eeaa. The Received Column was empty without a completed handshake resulting in no VPN/internet connectivity as reported for this issue. ~The route not added to the table does not seem to affect this as I speculated above. Inspecting the routing table as I did previously does not show a new route to the remote VPN server.~ However, full VPN/internet connectivity is back with all routes going through VPN as confirmed by watching the live firewall traffic log.

May 12 '24 01:05 kkcarlc

Who Is Affected This issue is specific for users of Wireguard who are intending to send all traffic through a VPN tunnel from their LAN to an upstream provider. They have configured Allowed IPs to 0.0.0.0/0 or ::/0 on the Edit Peer configuration screen. Below IPv4 is used as the example for simplicity.

The Issue The Wireguard handshake is not completed due to a routing issue. The client cannot connect to the upstream VPN provider due to the order in which the interface is configured. Specifically this commit as part of the 24.1.3 release moved the interface configuration (e.g. ifconfig wg0 up) below the block for adding routes: https://github.com/opnsense/core/commit/dbe52eeaa9c17ec56a22ff6cefcf6b94615bd8b4. This had the unintended effect of trying to configure a tunnel over a tunnel that does not exist yet.

Configuration

Description of the setup:

OPNsense router acts as the Wireguard client
OPNsense outbound NATs all requests through the Wireguard interface to a VPN provider
The router sits on the network edge with WAN connected to internet and LAN as your home network
All devices in your LAN use the OPNsense router as a gateway
The OPNsense router sees that traffic is not specific to the LAN or for the router, so it routes the traffic outside of the LAN

Wireguard Instance Setup:

Tunnel Address configured with <tunnel-ip>/32
Disable Routes is unchecked

Wireguard Peer Setup:

Allowed IPs is 0.0.0.0/0
Endpoint Address non-cidr single IP 1.2.3.4
Endpoint Port 51820
Keepalive Internal 25

Firewall Outbound NAT rules:

Hybrid Outbound NAT rule generation is checked
Firewall outbound NAT rule interface Wireguard (Group)
TCP/IP Version IPv4
Protocol any
Source address LAN net
Source port any
Destination address/port any/any
Translation/target Interface address

Description of the Bug

In the wg_start() function of opnsense/scripts/Wireguard/wg-service-control.php it first checks if we have a Wireguard interface up. Assuming a cold start (or restart), we do not. So the first task is to create the interface and add it to the wireguard group:

if (!does_interface_exist($server->interface)) {
    mwexecf('/sbin/ifconfig wg create name %s', [$server->interface]);
    mwexecf('/sbin/ifconfig %s group wireguard', [$server->interface]);
    $reload = true;
}

Second, it adds the Wireguard configuration from the Web UI to the interface by syncing the conf file:

mwexecf('/usr/bin/wg syncconf %s %s', [$server->interface, $server->cnfFilename]);

Third, it adds the tunnel IP address (tunnel address) and MTU (if set) to the Wireguard interface:

foreach (array_filter(explode(',', (string)$server->tunneladdress)) as $alias) {
    $proto = strpos($alias, ':') === false ? "inet" : "inet6";
    mwexecf('/sbin/ifconfig %s %s %s alias', [$server->interface, $proto, $alias]);
}
if (!empty((string)$server->mtu)) {
    mwexecf('/sbin/ifconfig %s mtu %s', [$server->interface, $server->mtu]);
}

[Bug Introduced] Before commit https://github.com/opnsense/core/commit/dbe52eeaa9c17ec56a22ff6cefcf6b94615bd8b4#diff-42cee99e3d444702e61cfd3b37d6accd73586da661add38c2ad5a04d16c6952cL85 it would bring up the Wireguard interface with:

mwexecf('/sbin/ifconfig %s %s', [$server->interface, $ifcfgflag]);

This immediately made a handshake to the the VPN server. This is no longer the case.

Fourth, wg_start() checks if Disable Routes is set for the Wireguard client (our router). In our case it is not set. So it sets up routes in the routing table according to Allowed IPs of the Wireguard peer. Within this block of code it checks if we intend to send all requests over the VPN:

if (str_ends_with(trim($address), '/0')) {
    if ($ipproto == 'inet') {
	array_push($routes_to_add[$ipproto], '0.0.0.0/1', '128.0.0.0/1');
    } else {
	array_push($routes_to_add[$ipproto], '::/1', '8000::/1');
    }
} elseif {

If so, it splits the full IP routing space into halves so as to not mess with the default route of the routing table. This essentially makes sure all requests go to the VPN. It then adds those routes to the interface:

mwexecf('/sbin/route -q -n add -%s %s -interface %s', [$ipproto, $route, $server->interface]);

At this point we haven't brought up the Wireguard interface so no handshake has been attempted.

Finally, after the routing table has been configured for our Wireguard interface, the wg interface is brought up:

if ($reload) {
  interfaces_restart_by_device(false, [(string)$server->interface]);
}

mwexecf('/sbin/ifconfig %s %s', [$server->interface, $ifcfgflag]);

At this point, the Wireguard interface attempts to make a handshake to the VPN server, the configured endpoint address of the peer. The endpoint address falls into the previously configured routes of 0.0.0.0/1 and 128.0.0.0/1 to be routed over the Wireguard interface. However, how can we make a handshake over a secure tunnel that we haven't set up yet to secure that tunnel? We can't. This is why the handshake is not completing and one will see Send traffic on the VPN status page of the web UI, but no Received traffic. It is attempting to send, but it cannot do so.

Mitigations

Mitigation 1: Revert to the old behavior

Apply this patch when ssh'ed into the OPNsense box:

diff --git a/src/opnsense/scripts/Wireguard/wg-service-control.php b/src/opnsense/scripts/Wireguard/wg-service-control.php
index 7f801b9b4..5ac8c4c75 100755
--- a/src/opnsense/scripts/Wireguard/wg-service-control.php
+++ b/src/opnsense/scripts/Wireguard/wg-service-control.php
@@ -81,6 +81,8 @@ function wg_start($server, $fhandle, $ifcfgflag = 'up', $reload = false)
         mwexecf('/sbin/ifconfig %s mtu %s', [$server->interface, $server->mtu]);
     }

+    mwexecf('/sbin/ifconfig %s %s', [$server->interface, $ifcfgflag]);
+
     if (empty((string)$server->disableroutes)) {
         /**
          * Add routes for all configured peers, wg-quick seems to parse 'wg show wgX allowed-ips' for this,
@@ -138,8 +140,6 @@ function wg_start($server, $fhandle, $ifcfgflag = 'up', $reload = false)
         interfaces_restart_by_device(false, [(string)$server->interface]);
     }

-    mwexecf('/sbin/ifconfig %s %s', [$server->interface, $ifcfgflag]);
-
     // flush checksum to ease change detection
     fseek($fhandle, 0);
     ftruncate($fhandle, 0);

To do so, first copy the patch to your router, then:

# As root

cd /usr/local
patch -p2 < /path/to/patch

This will revert to the previous behavior before this issue was introduced.

[1] However, do note that there may be a small time when the Wireguard interface comes up, yet routes will not be configured so there is a small chance that traffic can leak out of the tunnel before this happens. This is unconfirmed and speculated, but may be a possibility and part of your threat model so it warranted a mention.

Mitigation 2: Add an explicit route to your VPN server endpoint

Add an explicit route for your endpoint VPN server to be routed outside of the tunnel and through the WAN interface. This is recommended by the Wireguard Docs. You can set a route explicitly over ssh as noted in my previous comment on this issue. Or you can navigate to System->Routes->Configuration in the Web UI and add a static route there:

Network Address: <endpoint ip>/32
Gateway: WAN
Description: VPN Handshake Route

After completing one of the mitigations, you can run the latest OPNsense release and will be able to get a full handshake upon starting the Wireguard service.

May 14 '24 01:05 kkcarlc

Mitigation 2: Add an explicit route to your VPN server endpoint

Add an explicit route for your endpoint VPN server to be routed outside of the tunnel and through the WAN interface. This is recommended by the Wireguard Docs. You can set a route explicitly over ssh as noted in my previous comment on this issue. Or you can navigate to System->Routes->Configuration in the Web UI and add a static route there:
Network Address: <endpoint ip>/32
Gateway: WAN
Description: VPN Handshake Route
After completing one of the mitigations, you can run the latest OPNsense release and will be able to get a full handshake upon starting the Wireguard service.

At least on 24.1.8 I was still seeing issues with Mitigation #2 in place. The handshake completes successfully but the router still becomes unusable and disables remaining WAN access for the router.

Jun 18 '24 23:06 masterhuh

@masterhuh I believe that is a separate issue. This bug is only in regard to the initial handshake.

Wireguard has two peers, but for discussion one can be called client and the other server. The client is the one that makes the initial handshake. From outside the LAN, your client device is making the initial handshake and your Wireguard server is inside waiting for connection. This bug is from the LAN to an outside server where the client is the OPNSense router making the initial handshake.

To summarize, the OPNsense implementation creates the VPN overlay network (for all routes) before the initial handshake is established, rendering the handshake impossible. (The initial handshake has to happen on the underlay network)

So in your case, if the initial handshake is happening in your setup, then I believe you are not affected by this bug and the Wireguard stability is a separate issue.

Jun 20 '24 00:06 kkcarlc

To summarize, the OPNsense implementation creates the VPN overlay network (for all routes) before the initial handshake is established, rendering the handshake impossible. (The initial handshake has to happen on the underlay network)

What does that even mean?

Jun 20 '24 05:06 fichtner

@fichtner It means that OPNsense tries to do the handshake through the Wireguard interface instead of the WAN interface. Because there is no existing handshake, the Wireguard connection is not valid and you get stuck in a catch-22 where you need a handshake to perform the handshake.

With Wireguard active, if you look at the routing table, you'd probably find that the default route (0.0.0.0/0) is set to the Wireguard interface. That means that if the router receives a packet whose destination IP is not defined in the routing table, it would route the packet to the Wireguard interface. Now, when OPNsense tries to perform a handshake to a remote server, it would use the default route unless the destination IP address is in the routing table (which is what Mitigation 2 does).

Feel free to correct me if I've got some things wrong. I'm experiencing the same issue and this is what my intuition tells me is happening.

Jun 20 '24 11:06 cedoromal

Isn’t this simply a routing or policy based firewall rule configuration issue?

Jun 20 '24 12:06 fichtner

It seems to be a routing issue, but it shows up on a fairly common configuration (i.e. with "Disable routes" unchecked on Wireguard instance, and with Allowed IPs set to "0.0.0.0/0,::0/0" + a remote endpoint on the peer) so I'm not sure if it is expected behavior. Besides, author also mentioned that this isn't the usual behavior - citing the commit before this became a thing.

Before commit https://github.com/opnsense/core/commit/dbe52eeaa9c17ec56a22ff6cefcf6b94615bd8b4#diff-42cee99e3d444702e61cfd3b37d6accd73586da661add38c2ad5a04d16c6952cL85 it would bring up the Wireguard interface with:

mwexecf('/sbin/ifconfig %s %s', [$server->interface, $ifcfgflag]);

This immediately made a handshake to the the VPN server. This is no longer the case.

If it is expected behavior, then it is quite unintuitive imho. I mean... why would it send the handshake packet through the Wireguard interface knowing that there's no handshake yet? To be fair, it can easily be worked around by adding a manual entry on the routing table (Mitigation 2), but I'm not sure if that's a good enough answer.

Jun 20 '24 12:06 cedoromal

I don't mind changing it appropriately, but moving the "up" down a bit having wireguard break is a bit weird. We have to consider it works for thousands of user without issue.

Cheers, Franco

Jun 20 '24 13:06 fichtner

Just to be sure our reference point is 24.1.9?

Jun 20 '24 13:06 fichtner

Just checked. I'm currently using OPNsense 24.1.7_4-amd64. I'm not sure what @kkcarlc is using now, but he did mention on the original post that he's on 24.1.5_3. I haven't tested it on newer versions yet since I already went with the workaround (Mitigation 2) that the author posted and it worked just fine.

I'm just butting up on this conversation so I could finally remove the manual entry on my routing table if this behavior gets changed. haha

Jun 20 '24 13:06 cedoromal

well specifically I'm wondering if this helps: https://github.com/opnsense/changelog/blob/727f3153899c978ea84682ea5db08e3769f58809/community/24.1/24.1.9#L12

Jun 20 '24 13:06 fichtner

Thanks @cedoromal for your input on this. @fichtner I am currently on 24.1.8, not sure the patch level, but it is the most recent on that version. I have been checking if the behavior still exists between each upgrade and it has. I will be able to update to 24.1.9 later this evening and can report back as necessary.

@cedoromal described the issue succinctly with the catch-22 explanation.

I don't mind changing it appropriately, but moving the "up" down a bit having Wireguard break is a bit weird. We have to consider it works for thousands of user without issue.

This concerns me as well, but my intuition is that some have this issue and haven't been able to pinpoint it or maybe there are not as many people with the configuration sending all routes over the Wireguard interface? Anecdotally, I see more people trying to VPN into their networks instead of out. Not sure. However, I think the logic is sound if you follow through my comment above. Mainly, that the initial handshake cannot complete after the routing table configures 0.0.0.0/0,::0/0 to send out the Wireguard interface. It has to happen before.

Jun 20 '24 13:06 kkcarlc

well specifically I'm wondering if this helps: https://github.com/opnsense/changelog/blob/727f3153899c978ea84682ea5db08e3769f58809/community/24.1/24.1.9#L12

As mentioned just prior, I can test this out a bit later and report back.

Jun 20 '24 13:06 kkcarlc

Mainly, that the initial handshake cannot complete after the routing table configures 0.0.0.0/0,::0/0 to send out the Wireguard interface. It has to happen before.

Yes, it means the interface won't set its route correctly or it has been replaced with a faulty one (one reason why the mentioned patch exists). I can't rule out a bug in the wireguard plumbing yet I also know that interfaces set their own routes as soon as addresses are added to complement reachability over the link so this shouldn't easily break through reordering unless we really are in bigger trouble with the FreeBSD base or kernel (a specific route should always win over /0 no matter where it and when it was set.

Jun 20 '24 14:06 fichtner

I remember when I used wireguard on linux and used ::/0, it would not complete the handshake since it would send it through the tunnel.

Thats why tools like these exist to get an allowed IP list that specifically exclude a certain network.

https://www.procustodibus.com/blog/2021/03/wireguard-allowedips-calculator/

With that I excluded the endpoint IPv6 Prefix from the allowed IP list and then the handshake always works, but all other IPv6 traffic still goes through the tunnel

Jun 20 '24 15:06 Monviech

I didnt follow the whole thread but with wg-quick the 0.0.0.0/0 Was automatically translated to 0.0.0.0/1 and 128.0.0.0/1 to not get routing issues. When setting both in 24.1.8 it works great. Didnt test 0.0.0.0/0 as I was remote

Jun 20 '24 16:06 mimugmail

@fichtner I just upgraded to 24.1.9_3 and it looks like the issue still exists. Methodology: I disabled the explicit route for Wireguard endpoint, started VPN, did not receive a handshake. Left VPN running, enabled explicit route, and VPN handshake completed immediately.

I can't rule out a bug in the wireguard plumbing yet I also know that interfaces set their own routes as soon as addresses are added to complement reachability over the link so this shouldn't easily break through reordering unless we really are in bigger trouble with the FreeBSD base or kernel (a specific route should always win over /0 no matter where it and when it was set.

If I understand correctly, I think the FreeBSD base/kernel is ok since we are seeing route specificity being handled correctly. The mitigation#2 mentioned above gives a more specific route than the full Wireguard tunnel and directs it to leave via WAN. This gets picked up correctly and is working. The wg interface gets a private RFC1918 address as expected to communicate over the private network, and the routing table with 0.0.0.0/0 instructs every route to travel through it. Adding a specific /32 route to the routing table that is the VPN endpoint server directed out the WAN link (aka the handshake route) will be picked up and complete the handshake.

Jun 20 '24 21:06 kkcarlc

I remember when I used wireguard on linux and used ::/0, it would not complete the handshake since it would send it through the tunnel.

Thats why tools like these exist to get an allowed IP list that specifically exclude a certain network.

https://www.procustodibus.com/blog/2021/03/wireguard-allowedips-calculator/

With that I excluded the endpoint IPv6 Prefix from the allowed IP list and then the handshake always works, but all other IPv6 traffic still goes through the tunnel

@Monviech yes that is essentially what wg-quick does on Linux and what I discussed in this comment. But they do it 'in reverse' and that is the basis of mitigation#2 above. Instead of excluding the ipv6 prefix or ipv4 route from the tunnel (then that route goes out the default gw), they set the full routing space to go out the tunnel and then explicitly add a specific route to go outside the tunnel via WAN default gw.

Jun 20 '24 21:06 kkcarlc

I didnt follow the whole thread but with wg-quick the 0.0.0.0/0 Was automatically translated to 0.0.0.0/1 and 128.0.0.0/1 to not get routing issues. When setting both in 24.1.8 it works great. Didnt test 0.0.0.0/0 as I was remote

@mimugmail This is interesting since it seems like you have checked the 'disable routes' box in your configuration since you are setting those routes explicitly? Do you set the routes after the VPN is up? If so, then that means the initial handshake most likely completed already and the tunnel has been established, allowing this to work.

If the 'disable routes' box is not checked, then the wg-service-control.php script does as you say and sets the 0.0.0.0/1 and 128.0.0.1/1 routes. This routes the entire IP traffic space out of the Wireguard interface without messing with the default 0.0.0.0/0 route set by other services (such as DHCP via the ISP). It relies on route specificity to work and hence the /1 'bypassing' the default route.

Jun 20 '24 21:06 kkcarlc

Yes, I dont tick disable routes. Dont need policy based routes here

Jun 21 '24 03:06 mimugmail

still same issue here, need to uncheck and checked again

I have isolated the commit that introduces the bug to dbe52ee.

I reverse patched src/opnsense/scripts/Wireguard/wg-service-control.php in order from e0cee10ad to 77fba066b:
e0cee10ad
dbe52eeaa
30862f871
c1d2d18a7
0d7d48eb1
77fba066b
I then re-applied each patch in order from 77fba066b one at a time, starting and stopping the Wireguard VPN interface through the UI in VPN->WireGuard->Instances using:

.

Each time I would get a completed handshake as evidenced by the Received Column in VPN->WireGuard-Status:

However, when I applied patch dbe52ee. The Received Column was empty without a completed handshake resulting in no VPN/internet connectivity as reported for this issue. ~The route not added to the table does not seem to affect this as I speculated above. Inspecting the routing table as I did previously does not show a new route to the remote VPN server.~ However, full VPN/internet connectivity is back with all routes going through VPN as confirmed by watching the live firewall traffic log.

same issue here with opnsense version 24.7.3, need to uncheck and checked again then handshake working perfectly

Sep 04 '24 15:09 XCNBX

@XCNBX thanks for taking a close look, If there is an issue with dbe52ee it should be easy to confirm:

Are you using CARP VHID tracking which seemed relevant to the cause here?
When you look at ifconfig wgX the DOWN flag is set when UP is expected?
When running ifconfig wgX up does it start working immediately?

Sep 05 '24 09:09 fichtner

Ok the problem scope of the commit is the following:

When a WireGuard instance is assigned to an interface the interfaces_restart_by_device function will set it to UP, regardless of what CARP wanted to do with the instance. This is a problem, because then the CARP backup will be UP instead of DOWN.

Follow up question:

Do you have an interface for the WireGuard instance assigned?

Sep 05 '24 09:09 fichtner

@XCNBX thanks for taking a close look, If there is an issue with dbe52ee it should be easy to confirm:
1. Are you using CARP VHID tracking which seemed relevant to the cause here?

2. When you look at `ifconfig wgX` the DOWN flag is set when UP is expected?

3. When running `ifconfig wgX up` does it start working immediately?

no carp configuration
showing UP
no running command because already up

when i create new peer, and then peer cannot finishing handshake, so i need to disable and enable to work.

Sep 05 '24 09:09 XCNBX