I just upgraded my pair of firewalls from 11 to 12, and am now in the situation where CARP no longer works between them to faiilover the virtual addresse. Both machines come up thinking that they are the master. If I manually set the advskew on the interfaces to a high number on what should be passive then it briefly goes to backup mode, but then goes back to master with the message: BACKUP -> MASTER (preempting a slower master) This is kind of a big problem! Its also unexpected as I tested CARP on 12 in my development environment and it works here - though here we only have one address insetad of several. But this has worked fine for a very long time until now. The setup looks like this: ifconfig_em0="inet 10.32.10.1/16" ifconfig_em0_ipv6="inet6 2a02:1658:1:2:e550::1/64" ifconfig_em0_alias0="inet 10.32.10.6/16 vhid 10 advskew 10 pass redacted" ifconfig_em0_alias1="inet6 2a02:1658:1:2:e550::6/64 vhid 30 advskew 10 pass redacted" ifconfig_em1="inet 178.250.73.196/26" ifconfig_em1_ipv6="inet6 2a02:1658:1:1::1:2/64" ifconfig_em1_alias0="inet 178.250.73.198/26 vhid 20 advskew 10 pass redacted" ifconfig_em1_alias1="inet6 2a02:1658:1:1::1:1/64 vhid 40 advskew 10 pass redacted" ifconfig_em1_alias2="inet 178.250.73.199/26 vhid 20 advskew 10 pass redacted" ifconfig_em1_alias3="inet 178.250.73.200/26 vhid 20 advskew 10 pass redacted" ifconfig_em1_alias4="inet 178.250.73.221/26 vhid 20 advskew 10 pass redacted" ...and on the passive side almost identical except for the real IP's and the advskew which is set to 128. I have PF enables with pfsync as well, and I have set net.inet.carp.preempt=1 in systctl.conf. PF is configured to allow protocol 'carp' on both ether interfaces and 'pfsync' on the internal one. I did wonder if having the same vhid for a number of the addresse might be the issue so I then changed the config to have them all on separate vhid numbers, but the problem persists. This is now a bit of a major problem for me, as I am running on a single firewall with no faulover (which I dont like) and dont really know what the path forward is. As ever, all advice is welcome! -pete.
Thomas Steen Rasmussen
2019-Jan-16 14:31 UTC
CARP stopped working after upgrade from 11 to 12
On 1/16/19 3:14 PM, Pete French wrote:> I just upgraded my pair of firewalls from 11 to 12, and am now in the > situation where CARP no longer works between them to faiilover the > virtual addresse. Both machines come up thinking that they > are the master. If I manually set the advskew on the interfaces to > a high number on what should be passive then it briefly goes to backup > mode, but then goes back to master with the message: > > BACKUP -> MASTER (preempting a slower master) > > This is kind of a big problem!Indeed. I am seeing the same thing. Which revision of 12 are you running? I am currently (yesterday and today) bisecting revisions to find the commit which broke this, because it worked in 12-BETA2 but doesn't work on latest 12-STABLE. I have narrowed it down to somewhere between 12-STABLE-342037 which works, and 12-STABLE-342055 which does not. Only 4 commits touch 12-STABLE branch in that range: ------------------------------------------------------------------------ r342038 | eugen | 2018-12-13 10:52:40 +0000 (Thu, 13 Dec 2018) | 5 lines MFC r340394: ipfw.8: Fix part of the SYNOPSIS documenting LIST OF RULES AND PREPROCESSING that is still referred as last section of the SYNOPSIS later but was erroneously situated in the section IN-KERNEL NAT. ------------------------------------------------------------------------ r342047 | markj | 2018-12-13 15:51:07 +0000 (Thu, 13 Dec 2018) | 3 lines MFC r341638: Let kern.trap_enotcap be set as a tunable. ------------------------------------------------------------------------ r342048 | markj | 2018-12-13 16:07:35 +0000 (Thu, 13 Dec 2018) | 3 lines MFC r340405: Add accounting to per-domain UMA full bucket caches. ------------------------------------------------------------------------ r342051 | kp | 2018-12-13 20:00:11 +0000 (Thu, 13 Dec 2018) | 20 lines pfsync: Performance improvement pfsync code is called for every new state, state update and state deletion in pf. While pf itself can operate on multiple states at the same time (on different cores, assuming the states hash to a different hashrow), pfsync only had a single lock. This greatly reduced throughput on multicore systems. Address this by splitting the pfsync queues into buckets, based on the state id. This ensures that updates for a given connection always end up in the same bucket, which allows pfsync to still collapse multiple updates into one, while allowing multiple cores to proceed at the same time. The number of buckets is tunable, but defaults to 2 x number of cpus. Benchmarking has shown improvement, depending on hardware and setup, from ~30% to ~100%. Sponsored by:?? Orange Business Services ------------------------------------------------------------------------ Of these I thought r342051 sounded most likely, so I am currently building r342050. I will write again in a few hours when I have isolated the commit. Best regards, Thomas Steen Rasmussen