I''m having an odd issue that I think is related to arp, and
I''m hoping that
someone can help me figure out why it''s happening...
I am running nagios, monitoring a number of xen hosts. Yesterday, I
rebooted several of the machines (the physical hosts, not virtual machines).
Since then, nagios is sometimes reporting that the hosts are down because
pings to them fail.
Testing manually, I can see that this is the case.
This problem is also occurring only on servers that are on the local subnet;
servers on another subnet do not have cases where they lose connectivity.
Checking arp on the nagios server, I discovered that the machines that were
reporting down had entries like the following:
xenhost9 ether FE:FF:FF:FF:FF:FF C eth0
When the machines become available again, the entry changes to look like
this:
xenhost9 ether 00:E0:81:40:2A:AE C eth0
So, it appears that the nagios server (and, on at least one occasion,
another server on my network) is picking up a MAC address that is not that
of the physical interface on the xenhost.
Taking a look at the xenhost at a time when nagios was reporting that it was
down, I found these entries in the arp table:
nagios ether 00:16:3E:0C:DC:AC C xenbr0
nagios ether 00:16:3E:0C:DC:AC C eth0
I deleted the entry on xenbr0 by doing `arp -i xenbr0 -d nagios`, and
immediately nagios was able to ping the host again.
So, something is a little wonky here, but I don''t know what...
To make things stranger, I have a number of machines that are all running
the same configuration. Only the machines that were rebooted yesterday
morning are showing this issue.
The configuration that I''m working with is:
- Opensuse 10.3
- Xen 3.1.0_15042-51.3 installed from opensuse-packaged RPMs
- Two bridges (xenbr0 and xenbr1), created with a custom network-script that
does "/etc/xen/scripts/network-bridge start vifnum=0 bridge=xenbr0
netdev=eth0 && /etc/xen/scripts/network-bridge start vifnum=1
bridge=xenbr1
netdev=eth1"
On a machine that is having this problem, `ip addr` shows this:
[10:25:13] marlier@xenhost9:~> ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: peth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
link/ether 00:e0:81:40:2a:af brd ff:ff:ff:ff:ff:ff
inet 192.168.xx.229/24 brd 192.168.xx.255 scope global eth1
inet6 fe80::2e0:81ff:fe40:2aaf/64 scope link
valid_lft forever preferred_lft forever
4: vif0.0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:e0:81:40:2a:ae brd ff:ff:ff:ff:ff:ff
inet 192.168.xx.80/24 brd 192.168.xx.255 scope global eth0
inet 192.168.xx.229/24 brd 192.168.xx.255 scope global eth0:2
inet6 fe80::2e0:81ff:fe40:2aae/64 scope link
valid_lft forever preferred_lft forever
6: vif0.1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
7: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
8: vif0.2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
9: veth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
10: vif0.3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
11: veth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
12: xenbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet6 fe80::200:ff:fe00:0/64 scope link
valid_lft forever preferred_lft forever
13: xenbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
14: vif1.0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
32
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
[10:25:16] marlier@xenhost9:~>
On another machine that is _not_ having this issue (and which was not
rebooted yesterday), and that also has an identical configuration in terms
of scripts, versions, base OS, and so on, "ip addr" shows this:
[10:33:02] marlier@xenhost2:~> ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: peth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
3: peth1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
4: vif0.0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:e0:81:45:82:bc brd ff:ff:ff:ff:ff:ff
inet 192.168.xx.86/24 brd 192.168.xx.255 scope global eth0
inet 192.168.xx.222/24 brd 192.168.xx.255 scope global eth0:2
inet6 fe80::2e0:81ff:fe45:82bc/64 scope link
valid_lft forever preferred_lft forever
6: vif0.1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 00:e0:81:45:82:bd brd ff:ff:ff:ff:ff:ff
inet 192.168.xx.222/24 brd 192.168.xx.255 scope global eth1
inet6 fe80::2e0:81ff:fe45:82bd/64 scope link
valid_lft forever preferred_lft forever
8: vif0.2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
9: veth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
10: vif0.3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
11: veth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
14: xenbr0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
15: xenbr1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
[10:33:08] marlier@xenhost2:~>
I see those NOARP''s in there, and I wonder if that might be the
difference
(possibly?)...but the two machines are using the same scripts to create the
bridges, so why would they result in different configurations? And if that
is the issue, is there a way to force the bridge to be created with the
NOARP flag in there?
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users