DP not up (sometimes) when reloading config through SIGHUP
- When reloading the config through SIGHUP, faucet sometimes logs out DP not up and new flows are not sent down to the switch
- This behavior seems to be inconsistent
Here is the capture of the traffic between the switch and the controller when "DP not up"
Faucet version: 1.10.11
Would need a bit more information to debug this, I notice your capture is started after the log message so any change to TCP state of the control channel will be missing.
But does the switch eventually recover and have the correct flows programmed? dp not up isn't necessarily a problem, faucet is just saying the switch reset its control channel state.
Steps to reproduce
- A process send SIGHUP to the faucet controller every 5 seconds
- Faucet controller running listening to port 6653
- One Open vSwitch switch connected to the faucet controller
ovs-vsctl set-controller br-f1 tcp:127.0.0.1:6653
- Faucet config file contains 5 VLANs each with 3 ACL rules
- New flows are sent down to OVS
- Populate faucet config file with 3000 VLANs each with 3 ACL rules
- Faucet log shows
DP downand new flows aren't sent down to the OVS switch
PCAP files
The pcap files contains the captured packets starting at the moment the config file has 5 VLANs and the flows for those are sent down to the switch (everything was fine up until this point) and end after faucet log shows DP down
faucet.zip
Versions
- Faucet 1.10.11
- Open vSwitch 3.3.0
Thanks for the additional information.
This will be caused by the default openflow hello timers for openvswitch being too low for the number of flow rules you want to push and openvswitch timing out the connection.
You need to tune the following ovsdb options:
- inactivity_probe
- controller_rate_limit
- controller_burst_limit
There is some documentation here on how to do that:
https://bugs.launchpad.net/neutron/+bug/1817022
Also note there was a bug in certain versions of OVS (introduced in v2.12.0 and fixed in v3.3.0) where these configuration values weren't always honored, so make sure you aren't running an affected version, see details on this mailing list thread:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-September/408205.html
Thank you for the reply, I will try it ASAP.
Though I do have an additional question, I have not dug too much into the source code yet but I notice that, whenever there's changes in VLAN or Port, faucet "cold" starts, in other situations like changes to ACLs, faucet "warm" starts. Could you tell me why that is ?
Also could you clarify the behavior of "cold" starting vs "warm" starting ?
Sidenote:
- I have tried setting the inactivity_probe to 3000000 and the error still persists
- I tried edited out the part which I believed to cause faucet to "cold" start and the error seems to disappeared, and flows are sent down the switch.
- It happens even with few VLANs
File: valve.py Function: _apply_config_changes(self, new_dp, changes, valves=None)
# # If pipeline or all ports changed, default to cold start.
# if self._pipeline_change(new_dp):
# self.dp_init(new_dp, valves)
# return restart_type, ofmsgs
#
# if all_ports_changed:
# self.logger.info("all ports changed")
# self.dp_init(new_dp, valves)
# return restart_type, ofmsgs
Another follow up to this
This is the osken-manager log when the incident happened
This is the osken-manager log when "cold" reload works normally
This is the Open vSwitch logs in both cases
From the logs, I see that the error happens because an event is missing
connected socket:<eventlet.greenio.base.GreenSocket....
Could this be the reason ?
Cold start = faucet deletes all the flows in the openvswitch flow table and readds them
Warm start = faucet just applies a minimal diff to the flow table in order to implement the new behaviour represented by the config change (e.g add a new ACL)
From looking at some of your earlier logs it appears you are reloading faucet while it is in the middle of a cold start, have you tried only reloading it after the cold start finishes, does that help?
Regarding the Open vSwitch version, I'm using Open vSwitch 3.3.0
I also suspect reloading faucet while it's cold starting might be the problem, however I need to reload the config programmatically and I don't know a way to hook into faucet to check if cold start is in progress or not so I just probe the reload every 10 seconds.
For more context, I'm trying to use Faucet to control a single Open vSwitch switch br-f (this is what I name it) that will act as a multi-tenant firewall integrated with OpenStack br-ex bridge.
Also could you clarify why changes to VLAN needs cold starting ? Why all flows are deleted then re-add ?
I also want to know that is there any risk when changing Faucet to only "warm" starts even when the intended purpose is to "cold" starts when there are changes to VLAN ?
From what I've tested so far nothing dangerous has happend.
You can ask faucet which config file it currently has loaded via the prometheus interface, and only send it one HUP each time your configuration file changes, rather than reloading faucet every 10 seconds:
$ curl localhost:9302/metrics | grep faucet_config_hash_info
# HELP faucet_config_hash_info file hashes for last successful config
# TYPE faucet_config_hash_info gauge
faucet_config_hash_info{config_files="/etc/faucet/faucet.yaml",error="",hashes="ce1dfaa2df25e0643001fba754799238a576f328893c871c7592714cbec9fef6"} 1.0
For warm reloads we need to implement specific warm reload behaviour for every possible change you can make in the config file, as we need to compute a difference in the openflow rules and implement this difference as a series of flow adds/deletes/modifies. For things we haven't implemented warm reload for we revert back to a cold restart which will always work.
We would of course be open to contributions to implement warm restart for vlan change, you can find the code and relevant TODO here: https://github.com/faucetsdn/faucet/blob/main/faucet/valve.py#L1672-L1677
Thank you for your reply, I'll look into it.
Actually another idea for you that might be a bit easier, faucet has an environment variable FAUCET_CONFIG_STAT_RELOAD, if you set this to true, faucet will monitor your configuration file for changes and automatically reload when it changes:
https://docs.faucet.nz/en/latest/configuration.html#environment-variables
That seems nice but I do have one question, what is the behavior of Faucet when for example the configuration file gets written into while it's automatically reloading ?
I'm not sure if we have a test for that specific case, so not sure what would happen.