Thomas Steen Rasmussen
2019-Jan-16 16:39 UTC
CARP stopped working after upgrade from 11 to 12
On 1/16/19 3:53 PM, Steven Hartland wrote: I have confirmed that pfsync is the culprit. Read on for details.> I can't see how any of those would impact carp unless pf is now > incorrectly blocking carp packets, which seems unlikely from that commit. >Well I would agree, but nevertheless, here we are.> Questions: > > ?* Are you running a firewall?Yes, pf, but it permits CARP packets, and MASTER/SLAVE works well up to and including r342050. Rebuild to r342051 with the exact same configuration and now both nodes are MASTER.> ?* What does sysctl net.inet.carp report?net.inet.carp.ifdown_demotion_factor: 240 net.inet.carp.senderr_demotion_factor: 240 net.inet.carp.demotion: 0 net.inet.carp.log: 1 net.inet.carp.preempt: 1 net.inet.carp.dscp: 56 net.inet.carp.allow: 1> ?* What exactly does ifconfig report about your carp on both hosts?with 12-STABLE r342050: [tykling at fwclu2a ~]$ uname -a FreeBSD fwclu2a 12.0-STABLE FreeBSD 12.0-STABLE r342050 GENERIC amd64 [tykling at fwclu2a ~]$ ifconfig | grep carp ??????? carp: MASTER vhid 1 advbase 1 advskew 100 ??????? carp: MASTER vhid 1 advbase 1 advskew 100 ??????? carp: MASTER vhid 1 advbase 1 advskew 100 [tykling at fwclu2a ~]$ [tykling at fwclu2b ~]$ uname -a FreeBSD fwclu2b 12.0-STABLE FreeBSD 12.0-STABLE r342050 GENERIC amd64 [tykling at fwclu2b ~]$ ifconfig | grep carp ??????? carp: BACKUP vhid 1 advbase 1 advskew 200 ??????? carp: BACKUP vhid 1 advbase 1 advskew 200 ??????? carp: BACKUP vhid 1 advbase 1 advskew 200 [tykling at fwclu2b ~]$ and with 12-STABLE r342051: [tykling at fwclu2a ~]$ uname -a FreeBSD fwclu2a 12.0-STABLE FreeBSD 12.0-STABLE r342051 GENERIC amd64 [tykling at fwclu2a ~]$ ifconfig | grep carp ??????? carp: MASTER vhid 1 advbase 1 advskew 100 ??????? carp: MASTER vhid 1 advbase 1 advskew 100 ??????? carp: MASTER vhid 1 advbase 1 advskew 100 [tykling at fwclu2a ~]$ [tykling at fwclu2b ~]$ uname -a FreeBSD fwclu2b 12.0-STABLE FreeBSD 12.0-STABLE r342051 GENERIC amd64 [tykling at fwclu2b ~]$ ifconfig | grep carp ??????? carp: MASTER vhid 1 advbase 1 advskew 200 ??????? carp: MASTER vhid 1 advbase 1 advskew 200 ??????? carp: MASTER vhid 1 advbase 1 advskew 200 [tykling at fwclu2b ~]$> ?* Have you tried enabling more detailed carp logging using sysctl > ?? net.inet.carp.log? >It is at 1 and increasing it to 2 doesn't appear to log anything new. I tried disabling pfsync and rebooting both nodes, they came up as MASTER/SLAVE then. Then I tried enabling pfsync and starting it, and on the SLAVE node I immediately got: Jan 16 16:34:56 fwclu2b kernel: carp: demoted by -240 to -240 (pfsync bulk done) Jan 16 16:34:56 fwclu2b kernel: carp: 1 at lagg2.52: BACKUP -> MASTER (preempting a slower master) Jan 16 16:34:56 fwclu2b kernel: carp: 1 at lagg2.51: BACKUP -> MASTER (preempting a slower master) Jan 16 16:34:56 fwclu2b kernel: carp: 1 at lagg3: BACKUP -> MASTER (preempting a slower master) Stopping pfsync again does not make it go back to SLAVE. Best regards, Thomas Steen Rasmussen
> I have confirmed that pfsync is the culprit. Read on for details.Excellent work. I;m home now, so won't get a chnace to out this into practice until tomorrow unfortunately, but it's brilliant that you have confirmed it.> I tried disabling pfsync and rebooting both nodes, they came up as > MASTER/SLAVE then.This is very useful to know - I willprobably try tomorrow running my firewalls back up with pfsync disabled to see if it works for me too.> Then I tried enabling pfsync and starting it, and on the SLAVE node I > immediately got:That kind of confirms it really doesnt it ? So, is it possible to get r342051 backend out of STABLE for now ? This is a bit 'gotcha' for anyone running a firewall pair with CARp after all. -pete. PS: are you going to file a PR ?