I went a bit further...
lvs1# service keepalived stop
lvs2# service keepalived stop
lvs1# service network restart
lvs2# service network restart
Clean start
lvs1# service keepalived start
Feb 25 15:03:18 lvs1 Keepalived: Starting Keepalived v1.1.16 (02/17,2009)
Feb 25 15:03:18 lvs1 Keepalived: Starting Healthcheck child process, pid=9511
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Using MII-BMSR NIC polling
thread...
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Netlink reflector reports IP
192.168.28.226 added
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Netlink reflector reports IP
10.0.0.1 added
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Registering Kernel netlink
reflector
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Registering Kernel netlink
command channel
Feb 25 15:03:18 lvs1 Keepalived: Starting VRRP child process, pid=9512
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Using MII-BMSR NIC polling thread...
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Netlink reflector reports IP
192.168.28.226 added
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Netlink reflector reports IP 10.0.0.1
added
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Registering Kernel netlink reflector
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Registering Kernel netlink command channel
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Registering gratutious ARP shared channel
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Opening file
''/etc/keepalived/keepalived.conf''.
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Configuration is using : 13235
Bytes
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Activating healtchecker for
service [10.0.0.11:80]
Feb 25 15:03:18 lvs1 Keepalived_healthcheckers: Activating healtchecker for
service [10.0.0.12:80]
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Opening file
''/etc/keepalived/keepalived.conf''.
Feb 25 15:03:18 lvs1 Keepalived_vrrp: Configuration is using : 34062 Bytes
Feb 25 15:03:18 lvs1 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112),
fd(10,11)]
No VIP and no checks on the web servers...
lvs2# service keepalived start
Feb 25 15:05:23 lvs2 Keepalived: Starting Keepalived v1.1.16 (02/17,2009)
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Using MII-BMSR NIC polling
thread...
Feb 25 15:05:23 lvs2 Keepalived: Starting Healthcheck child process, pid=8718
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Using MII-BMSR NIC polling thread...
Feb 25 15:05:23 lvs2 Keepalived: Starting VRRP child process, pid=8719
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Netlink reflector reports IP
192.168.28.227 added
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Netlink reflector reports IP
10.0.0.2 added
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Registering Kernel netlink
reflector
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Registering Kernel netlink
command channel
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Netlink reflector reports IP
192.168.28.227 added
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Netlink reflector reports IP 10.0.0.2
added
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Registering Kernel netlink reflector
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Registering Kernel netlink command channel
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Registering gratutious ARP shared channel
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Opening file
''/etc/keepalived/keepalived.conf''.
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Configuration is using : 13233
Bytes
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Activating healtchecker for
service [10.0.0.11:80]
Feb 25 15:05:23 lvs2 Keepalived_healthcheckers: Activating healtchecker for
service [10.0.0.12:80]
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Opening file
''/etc/keepalived/keepalived.conf''.
Feb 25 15:05:23 lvs2 Keepalived_vrrp: Configuration is using : 34060 Bytes
Feb 25 15:05:23 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 25 15:05:23 lvs2 Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112),
fd(10,11)]
No VIP and only one check on the web servers...
lvs1# service keepalived stop
Feb 25 15:07:30 lvs1 Keepalived: Terminating on signal
Feb 25 15:07:30 lvs1 Keepalived: Stopping Keepalived v1.1.16 (02/17,2009)
Feb 25 15:07:30 lvs1 Keepalived_vrrp: Terminating VRRP child process on signal
Feb 25 15:07:30 lvs1 Keepalived_healthcheckers: Terminating Healthchecker child
process on signal
And nothing else (lvs2 does not become MASTER)...
lvs1# service keepalived start
Nothing much...
lvs2# service keepalived stop
lvs2# service keepalived start
Nothing and no checks on the web servers...
lvs1# service keepalived stop
lvs1# service keepalived start
Nothing and no checks on the web servers...
lvs1# service keepalived stop
lvs1# service keepalived start
Nothing and only one check on the web servers...
Always stuck on "VRRP sockpool"
By the way, a restart or a stop+restart too fast too often leads to a failed
start with "daemon is already running"
lvs1# service keepalived restart
Nothing and no checks on the web servers...
lvs1# service keepalived restart
Nothing and no checks on the web servers...
lvs1# service keepalived restart
Nothing and no checks on the web servers...
lvs1# service keepalived restart
Baam, suddenly many vrrp packets, and one web servers check
Feb 25 15:15:11 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio
advert, forcing new election
Feb 25 15:15:11 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous
ARPs on eth0 for 192.168.16.123
Feb 25 15:15:11 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous
ARPs on eth0 for 192.168.16.123
Feb 25 15:15:16 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio
advert, forcing new election
Feb 25 15:15:16 lvs1 Keepalived_vrrp: VRRP_Instance(VI_1) Sending gratuitous
ARPs on eth0 for 192.168.16.123
Feb 25 15:14:50 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER
STATE
Feb 25 15:14:50 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio
advert
Feb 25 15:14:50 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Feb 25 15:14:55 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER
STATE
Feb 25 15:14:55 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio
advert
Feb 25 15:14:55 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
The web servers are correctly accessed from outside in rr; but there are still
no web checks from the keepalives...
lvs1# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.16.123:http rr
-> 10.0.0.12:http Route 1 0 28
-> 10.0.0.11:http Route 1 0 28
lvs2# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.16.123:http rr
-> 10.0.0.12:http Route 1 0 0
-> 10.0.0.11:http Route 1 0 0
lvs1# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
link/ether 00:04:23:9e:f3:74 brd ff:ff:ff:ff:ff:ff
inet 192.168.28.226/20 brd 192.168.31.255 scope global eth0
inet 192.168.16.123/32 scope global eth0
inet6 fe80::204:23ff:fe9e:f374/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
100
link/ether 00:04:23:9e:f3:75 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/8 brd 10.255.255.255 scope global eth1
inet6 fe80::204:23ff:fe9e:f375/64 scope link
valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
No VIP on lvs2 (BACKUP state)
lvs1# service keepalived stop
Feb 25 15:29:06 lvs2 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER
STATE
tcpdump => VRRP.MCAST.NET: VRRPv2, Advertisement, vrid 51, prio 0, authtype
none, intvl 1s, length 20
No VIP on lvs1 and lvs2, ARP resolution for VIP incomplete...
lvs2# ip a add dev eth0 local 192.168.16.123/32 scope global
Baam, suddenly vrrp packets, and one round (only) of web server checks
15:33:18.639546 IP lvs2.iper > VRRP.MCAST.NET: VRRPv2, Advertisement, vrid
51, prio 99, authtype none, intvl 1s, length 20
15:33:19.641002 IP lvs2.iper > VRRP.MCAST.NET: VRRPv2, Advertisement, vrid
51, prio 99, authtype none, intvl 1s, length 20
lvs1# service keepalived start
Nothing...
lvs2# service keepalived stop
Baam, suddenly vrrp packets, and one round (only) of web server checks
The web servers are correctly accessed from outside in rr...
lvs2# service keepalived start
Nothing, other than Entering BACKUP STATE
Both lvs have the VIP up...
lvs1# service keepalived stop
Same as above, except the VIP is up on lvs2 and down on lvs1, and no
webchecks...
The web servers are correctly accessed from outside in rr...
lvs1# service keepalived start
Nothing...
lvs1 "stuck" on VRRP sockpool, while lvs2 is still MASTER
VIP down on lvs1 and up on lvs2
lvs2# service keepalived stop
Baam, suddenly vrrp packets, no web server checks at all
The web servers are correctly accessed from outside in rr...
Both lvs have the VIP up
lvs1# service keepalived stop
lvs1# service keepalived start
lvs2# service keepalived stop
Same as above except that there are webchecks from lvs1 now...
lvs2# service keepalived start
backup state, no webchecks from lvs2
lvs1# service keepalived stop
lvs2 => MASTER
VIP is up on lvs2, down on lvs1
Everything is stuck for like 30s... and then web servers are accessible.
lvs1# service keepalived start
Nothing...
lvs1 "stuck" on VRRP sockpool, while lvs2 is still MASTER
VIP down on lvs1 and up on lvs2
lvs2# service network restart
baam, vrrp packets, lvs1 transition to MASTER and sends ARPs
And I get regular webchecks from both lvs...
And if I bring down one web server, it is correctly removed from the services.
2mns later, no more web checks...
lvs1# service keepalived stop
lvs2 => MASTER
VIP is down on both lvs... ARP is incomplete.
Everything is stuck for ever...
lvs2# ip a add dev eth0 local 192.168.16.123/32 scope global
baam, vrrp packets, lvs1 entering MASTER state and sends ARPs
I caught this: Netlink: error: File exists, type=(20), seq=1235574458, pid=0
Looking for errors in the logs, I found:
Feb 23 16:20:20 lvs1 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:20:20 lvs1 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 16:42:58 lvs1 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:42:58 lvs1 Keepalived_healthcheckers: Netlink: filter function error
Feb 25 12:00:50 lvs1 kernel: IPVS: ip_vs_send_async error
Feb 25 12:12:04 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output
error
Feb 25 12:12:04 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:05 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output
error
Feb 25 12:12:05 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:05 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output
error
Feb 25 12:12:05 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:06 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output
error
Feb 25 12:12:06 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:06 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output
error
Feb 25 12:12:06 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:07 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output
error
Feb 25 12:12:07 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:07 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output
error
Feb 25 12:12:07 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:08 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output
error
Feb 25 12:12:08 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:08 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output
error
Feb 25 12:12:08 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:09 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth0 failed: Input/output
error
Feb 25 12:12:09 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:09 lvs1 Keepalived_vrrp: SIOCGMIIREG on eth1 failed: Input/output
error
Feb 25 12:12:09 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:10 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:10 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:11 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:11 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:12 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:12 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:13 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:13 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:14 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:14 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:15 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:16 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:12:16 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth0 failed:
Input/output error
Feb 25 12:12:17 lvs1 Keepalived_healthcheckers: SIOCGMIIREG on eth1 failed:
Input/output error
Feb 25 12:33:39 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235561506, pid=0
Feb 25 12:39:11 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235561507, pid=0
Feb 25 12:40:10 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235561508, pid=0
Feb 25 12:40:52 lvs1 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235561509, pid=0
Feb 23 16:20:16 lvs2 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:20:16 lvs2 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 16:42:46 lvs2 Keepalived_vrrp: Netlink: filter function error
Feb 23 16:42:46 lvs2 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 17:35:36 lvs2 Keepalived_healthcheckers: Netlink: filter function error
Feb 23 17:35:36 lvs2 Keepalived_vrrp: Netlink: filter function error
Feb 25 12:25:22 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235560956, pid=0
Feb 25 12:30:50 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235561435, pid=0
Feb 25 15:33:18 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235570954, pid=0
Feb 25 16:12:02 lvs2 Keepalived_vrrp: Netlink: error: Cannot assign requested
address, type=(21), seq=1235574457, pid=0
Feb 25 16:29:11 lvs2 Keepalived_vrrp: Netlink: error: File exists, type=(20),
seq=1235574458, pid=0
Do you have any idea about what could be causing these problems?
Thx,
JD