Ken Bass
2009-Nov-18 16:39 UTC
[CentOS-virt] Xen domU default gateway missing/ARP table full
I have been trying to figure out why my domU NIC becomes unreachable (could not even ping) at various times. (Normally when the server was trying to update clamav from the various busy mirrors at 4am). There also seemed to be some latency when connecting which I chalked up to it being a virtual machine. When I checked my logs, I found thousands of : Nov 17 04:07:52 nomad kernel: Neighbour table overflow. and applications reporting errors such as: Nov 17 04:08:05 nomad freshclam[4085]: nonblock_connect: connect(): fd=5 errno=105: No buffer space available I am running a routed (not bridged) configuration. What I figured out is that each Centos 5.4 domU is maintaining an ARP table. That table is filling up which causes the network to be unreachable until entries are purged from the cache. Since this is a routed configuration, the ARP table should really only consist of two or three entries, my domU, my dom0, and the gateway. It appears the networking-scripts until Centos are ignoring the GATEWAY entry. I end up with route of: 169.254.0.0 * 255.255.0.0 U 0 0 0 eth0 default * 0.0.0.0 U 0 0 0 eth0 The default route should be the specific IP address in my /etc/sysconfig/network file. When I manually add the route, the arp table issue is fixed. The network stack no longer trys to query an arp entry for every IP address. I found this bug at Xen which was closed as INVALID saying 'Centos is broken'. That was from 2006. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=596 Any ideas on what is broken and what the correct fix is? Right now, I just added /sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw x.x.x.x to my /etc/rc.local which seems like a hack solution.
Pasi Kärkkäinen
2009-Nov-19 12:23 UTC
[CentOS-virt] Xen domU default gateway missing/ARP table full
On Wed, Nov 18, 2009 at 11:39:24AM -0500, Ken Bass wrote:> I have been trying to figure out why my domU NIC becomes unreachable > (could not even ping) at various times. (Normally when the server was > trying to update clamav from the various busy mirrors at 4am). There > also seemed to be some latency when connecting which I chalked up to it > being a virtual machine. > > When I checked my logs, I found thousands of : > Nov 17 04:07:52 nomad kernel: Neighbour table overflow. > and applications reporting errors such as: > Nov 17 04:08:05 nomad freshclam[4085]: nonblock_connect: connect(): fd=5 > errno=105: No buffer space available > > I am running a routed (not bridged) configuration. > > What I figured out is that each Centos 5.4 domU is maintaining an ARP > table. That table is filling up which causes the network to be > unreachable until entries are purged from the cache. Since this is a > routed configuration, the ARP table should really only consist of two or > three entries, my domU, my dom0, and the gateway. > > It appears the networking-scripts until Centos are ignoring the GATEWAY > entry. I end up with route of: > 169.254.0.0 * 255.255.0.0 U 0 0 0 eth0 > default * 0.0.0.0 U 0 0 0 eth0 > > The default route should be the specific IP address in my > /etc/sysconfig/network file. When I manually add the route, the arp > table issue > is fixed. The network stack no longer trys to query an arp entry for > every IP address. > > I found this bug at Xen which was closed as INVALID saying 'Centos is > broken'. That was from 2006. > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=596 > > Any ideas on what is broken and what the correct fix is? Right now, I > just added > > /sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw x.x.x.x > > to my /etc/rc.local which seems like a hack solution. >I usually specify the default gateway in /etc/sysconfig/network-scripts/ifcfg-eth0 and it works just fine. -- Pasi
Ed Heron
2009-Nov-19 17:37 UTC
[CentOS-virt] Xen domU default gateway missing/ARP table full
I was slightly confused about this thread until I realized you were using static IP config on your VM's... Why do people do that? I have an extra step of picking up the HW address (or setting the HW address when creating the VM) and putting it into my dhcp configuration, but then I have all of my hosts in a single file and I can change the network configuration of my whole network in a single place. I realize that my DHCP server becomes a single point of failure, but with a reasonably long retrain time the DHCP server going down won't effect any workstations for as much as several hours (as long as nothing reboots). Also, there are ways of having fault tolerance with DHCP, the easiest would be to have a non running VM with a copy of the data.