Posting to -stable since -pf didn't show much interest :/ -------- Original Message -------- Subject: CARP interfaces and mastership issue Date: Thu, 15 Sep 2011 11:07:37 +0200 Hello list, TLDR: carp interface becomes MASTER for a split second after being created, even if another MASTER exists on the network with faster advertisements. Breaks connections. HOWTO prevent ? We've been experiencing this double mastership problem with CARP interfaces. Allow me to put some context here: 2 firewalls, PF1, PF2, each with 2 VLANs (for example, some have more) on a lagg device (link aggregation). These firewalls then share virtual IPs through CARP interfaces, let us assume the following: PF1: - vlan13 - vlan410 - carp13 (advskew 50) - carp410 (advskew 50) PF2: - vlan13 - vlan410 - carp13 (advskew 100) - carp410 (advskew 100) CARP preemption is turned on, so that if vlan13 should fail on PF1, PF2 would assume mastership on both CARP interfaces. Syscontrols below: net.inet.carp.allow: 1 net.inet.carp.preempt: 1 net.inet.carp.log: 1 net.inet.carp.arpbalance: 0 net.inet.carp.suppress_preempt: 0 The problem we have is, say for example we reboot PF2. When it comes back up, it will, even for a split second, assume CARP mastership for its interfaces, at the same time as PF1. This breaks existing sessions, openvpn tunnels and new client connections. While I acknowledge the home-made demons should be built to support tiny network outages, this doesn't solve our main problem. We have the same issue when destroying/creating said CARP interfaces. Recently we upgraded some switches' IOS version on our backup datacenter (which also has 2 PF boxes, sharing the CARP IPs with the 2 PFs on our production DC). To prevent anything nasty happening, we forbade production VLANs on the switches' uplink ports and only allowed management traffic to allow us to perform the upgrade. Things went smoothly but when we brought the production VLANs up again at layer 2 on the switches, when spanning-tree converged we had again a double MASTER problem. I understand I could have avoided it by destroying/recreating the CARP interfaces, but even in this case there is a split second during which both firewalls are CARP MASTER. Is there any way to force CARP to assume INIT state for some time when coming up, and only after X seconds either become MASTER or BACKUP ? Any other idea how to solve this, guys ?
> Things went smoothly but when we brought the production VLANs up again > at layer 2 on the switches, when spanning-tree converged we had again a > double MASTER problem. >In older versions of FBSD, creating logical interfaces like vlan(4) and carp(4) had an nasty inadvertent side effect of toggling the state of the underlying phyiscal interface. This may be fixed in newer version. This would then then cause STP to reset on the switchport which can take up to 50 seconds to restore. In the mean time, the backup host hasn't heard from the master and assume the role of master. You can try turning on switchport spanning-tree portfast on your backup system which should cut down this time signifantly. If you can assure that no STP BPDUs will be announced from your CARP system, then its probably safe to run PortFast on a trunk. The same is true after a reboot. Maybe hack the RC script to force the CARP interfaces on your backup to stay down at boot time for an extra 10/15 seconds ~BAS> I understand I could have avoided it by destroying/recreating the CARP > interfaces, but even in this case there is a split second during which > both firewalls are CARP MASTER. > > > > > Is there any way to force CARP to assume INIT state for some time when > coming up, and only after X seconds either become MASTER or BACKUP ? > > Any other idea how to solve this, guys ? > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
On 9/15/11 11:07 AM, Damien FLEURIOT wrote:> Hello list, > > > > > TLDR: carp interface becomes MASTER for a split second after being > created, even if another MASTER exists on the network with faster > advertisements. Breaks connections. HOWTO prevent ? > > > > > We've been experiencing this double mastership problem with CARP interfaces. > > > Allow me to put some context here: > > 2 firewalls, PF1, PF2, each with 2 VLANs (for example, some have more) > on a lagg device (link aggregation). > These firewalls then share virtual IPs through CARP interfaces, let us > assume the following: > > PF1: > - vlan13 > - vlan410 > - carp13 (advskew 50) > - carp410 (advskew 50) > > PF2: > - vlan13 > - vlan410 > - carp13 (advskew 100) > - carp410 (advskew 100) > > CARP preemption is turned on, so that if vlan13 should fail on PF1, PF2 > would assume mastership on both CARP interfaces. > Syscontrols below: > net.inet.carp.allow: 1 > net.inet.carp.preempt: 1 > net.inet.carp.log: 1 > net.inet.carp.arpbalance: 0 > net.inet.carp.suppress_preempt: 0 > > > The problem we have is, say for example we reboot PF2. > When it comes back up, it will, even for a split second, assume CARP > mastership for its interfaces, at the same time as PF1. > > This breaks existing sessions, openvpn tunnels and new client connections. > > While I acknowledge the home-made demons should be built to support tiny > network outages, this doesn't solve our main problem. > > > > > > We have the same issue when destroying/creating said CARP interfaces. > > Recently we upgraded some switches' IOS version on our backup datacenter > (which also has 2 PF boxes, sharing the CARP IPs with the 2 PFs on our > production DC). > To prevent anything nasty happening, we forbade production VLANs on the > switches' uplink ports and only allowed management traffic to allow us > to perform the upgrade. > > Things went smoothly but when we brought the production VLANs up again > at layer 2 on the switches, when spanning-tree converged we had again a > double MASTER problem. > > I understand I could have avoided it by destroying/recreating the CARP > interfaces, but even in this case there is a split second during which > both firewalls are CARP MASTER. > > > > > Is there any way to force CARP to assume INIT state for some time when > coming up, and only after X seconds either become MASTER or BACKUP ? > > Any other idea how to solve this, guys ? > > > > _______________________________________________ > freebsd-pf@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-pf > To unsubscribe, send any mail to "freebsd-pf-unsubscribe@freebsd.org"Hello List, This is a follow-up to my original email quoted above. It seems that there is an existing bug in OpenBSD 3.8 and lower's CARP implementation which causes CARP interfaces to skip the INIT state altogether and start as MASTER if preempt is enabled. Source: https://calomel.org/pf_carp.html Quote: INIT : All CARP interfaces start in this state. Also, when a CARP interface is admin down, i.e. "ifconfig em0 down", it is put into this state. When a CARP interface is admin up, it immediately transitions to BACKUP. Note that in OpenBSD 3.8 and earlier, a bug exists which will cause the host to transition to MASTER right away if preempt is enabled. I have been able to verify and reproduce this behavior on boxes running both 8.1 and 8.2 FreeBSD. Does anyone know what version of OpenBSD's CARP implementation we're running on FreeBSD 8.x ? It seems like this is the same bug, to me.