So, I have been running into network problems for a while on 4 boxes that I installed xen on so that some engineers have places to test code. This particular problem is happening on all 4 of these boxes. (although, it isn''t happening on an older box running xen from debian etch). What appears to be the problem is that traffic is getting dropped between the vif#.0 interface in dom0 and the eth0 interface in the guest. To find this out, I started a ping flood from one domU that was pinging another domU. About every 10 minutes, there will be a lot of ping requests going out but no replies coming back. I think it really weird that it happens like ever 10 minutes plus about 2 seconds. While the ping was going, I did tcpdumps from the domU starting the ping, from the vif#.0 of the pinging machine, from the virtual bridge, from the vif#.0 for the receiving guest, and then from the receiving domU. The packets are making it all the way to the dom0 vif for the receiving guest but not making it to the eth0 in the guest. I have no clue why this is happening and it happens in rather regular intervals. The same thing happens in pinging a different guest and it happens in about the same interval but at different times. Also, during the ping flood, there is never a pause in the sending of packets out of the guest. Only a pause on the packets going from the host to the guest. I am running this on 64 bit Debian Lenny using the distribution''s packages. xen-hypervisor-3.2-1-amd64 version 3.2.1-2 and linux-image-2.6.26-2-xen-amd64 version 2.6.26-17. Here are the networking configs. --------- dom0# cat /etc/network/interfaces # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet static address 10.135.7.34 netmask 255.255.255.224 network 10.135.7.32 broadcast 10.135.7.63 gateway 10.135.7.33 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 10.135.7.34 dns-search qa1.mozyops.com auto vmnet iface vmnet inet static address 10.135.2.71 netmask 255.255.255.224 bridge_ports eth1 # bridge_stp off # bridge_fd 9 # bridge_hello 2 # bridge_maxage 12 --------- DomU# cat /etc/network/interfaces # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet dhcp post-up ethtool -K eth0 tx off --------- Dom0# brctl show vmnet bridge name bridge id STP enabled interfaces vmnet 8000.003048c8166d no eth1 vif1.0 vif10.0 <other interfaces> --------- Does anyone have any ideas as to what is going on here? Or more importantly, any ideas on how to solve this? I have tried building a newer domU kernel from scratch but I haven''t been able to make any progress there. The guest fails to boot without showing anything on the console. It then goes into this loop of trying to reboot the guest but failing. I would really like to stay with the debian kernels. I have been banging my head against a wall for a week or so on this and desperately need some help to get this working. I have engineers that are getting held up by this bug. Thanks for any insight you guys can give. mike _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I did ''echo 1 > /proc/sys/xen/independent_wallclock'' on the host and several of the guests. Then I started the ping flood and the same problem showed up. The warnings about time were not happening every 10 minutes like the networking problem is. But I haven''t seen the clock errors since making the change. mike Nathan Eisenberg wrote:> Yea, I believe they are. Take a look at the independant wallclock setting, it should help. > Best Regards, > Nathan Eisenberg > Sr. Systems Administrator > Atlas Networks, LLC > > Sent from my BlackBerry > > -----Original Message----- > From: Mike Lovell <mike@dev-zero.net> > > Date: Wed, 22 Jul 2009 17:07:03 > To: Nathan Eisenberg<nathan@atlasnetworks.us> > Subject: Re: [Xen-users] network problems > > > All of the mac addresses are unique. During my ping floods, I do what > look like clock errors. They say something like: > > Warning: time of day goes back (-27109us), taking countermeasures. > > I wasn''t sure if these were related so I didn''t mention them before. Is > it part of the problem? > > mike > > Nathan Eisenberg wrote: > >> Have you checked to make sure all of the MAC addresses are unique? Are there any time/clock went backwards messages on the console/dmesg output? >> >> Best Regards >> Nathan Eisenberg >> Sr. Systems Administrator >> Atlas Networks, LLC >> support@atlasnetworks.us >> http://support.atlasnetworks.us/portal >> >> >> -----Original Message----- >> From: xen-users-bounces@lists.xensource.com [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Mike Lovell >> Sent: Wednesday, July 22, 2009 3:49 PM >> To: xen-users@lists.xensource.com >> Subject: [Xen-users] network problems >> >> So, I have been running into network problems for a while on 4 boxes >> that I installed xen on so that some engineers have places to test code. >> This particular problem is happening on all 4 of these boxes. (although, >> it isn''t happening on an older box running xen from debian etch). >> >> What appears to be the problem is that traffic is getting dropped >> between the vif#.0 interface in dom0 and the eth0 interface in the >> guest. To find this out, I started a ping flood from one domU that was >> pinging another domU. About every 10 minutes, there will be a lot of >> ping requests going out but no replies coming back. I think it really >> weird that it happens like ever 10 minutes plus about 2 seconds. While >> the ping was going, I did tcpdumps from the domU starting the ping, from >> the vif#.0 of the pinging machine, from the virtual bridge, from the >> vif#.0 for the receiving guest, and then from the receiving domU. The >> packets are making it all the way to the dom0 vif for the receiving >> guest but not making it to the eth0 in the guest. I have no clue why >> this is happening and it happens in rather regular intervals. The same >> thing happens in pinging a different guest and it happens in about the >> same interval but at different times. Also, during the ping flood, there >> is never a pause in the sending of packets out of the guest. Only a >> pause on the packets going from the host to the guest. >> >> I am running this on 64 bit Debian Lenny using the distribution''s >> packages. xen-hypervisor-3.2-1-amd64 version 3.2.1-2 and >> linux-image-2.6.26-2-xen-amd64 version 2.6.26-17. Here are the >> networking configs. >> >> --------- >> dom0# cat /etc/network/interfaces >> # The loopback network interface >> auto lo >> iface lo inet loopback >> >> # The primary network interface >> auto eth0 >> iface eth0 inet static >> address 10.135.7.34 >> netmask 255.255.255.224 >> network 10.135.7.32 >> broadcast 10.135.7.63 >> gateway 10.135.7.33 >> # dns-* options are implemented by the resolvconf package, if installed >> dns-nameservers 10.135.7.34 >> dns-search qa1.mozyops.com >> >> auto vmnet >> iface vmnet inet static >> address 10.135.2.71 >> netmask 255.255.255.224 >> bridge_ports eth1 >> # bridge_stp off >> # bridge_fd 9 >> # bridge_hello 2 >> # bridge_maxage 12 >> >> --------- >> DomU# cat /etc/network/interfaces >> # The loopback network interface >> auto lo >> iface lo inet loopback >> >> # The primary network interface >> auto eth0 >> iface eth0 inet dhcp >> post-up ethtool -K eth0 tx off >> >> --------- >> Dom0# brctl show vmnet >> bridge name bridge id STP enabled interfaces >> vmnet 8000.003048c8166d no eth1 >> vif1.0 >> vif10.0 >> <other interfaces> >> --------- >> >> Does anyone have any ideas as to what is going on here? Or more >> importantly, any ideas on how to solve this? I have tried building a >> newer domU kernel from scratch but I haven''t been able to make any >> progress there. The guest fails to boot without showing anything on the >> console. It then goes into this loop of trying to reboot the guest but >> failing. I would really like to stay with the debian kernels. >> >> I have been banging my head against a wall for a week or so on this and >> desperately need some help to get this working. I have engineers that >> are getting held up by this bug. >> >> Thanks for any insight you guys can give. >> >> mike >> >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xensource.com >> http://lists.xensource.com/xen-users >> >> >> >> >> >> > > > > > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mike Lovell wrote:> So, I have been running into network problems for a while on 4 boxes > that I installed xen on so that some engineers have places to test > code. This particular problem is happening on all 4 of these boxes. > (although, it isn''t happening on an older box running xen from debian > etch). > > What appears to be the problem is that traffic is getting dropped > between the vif#.0 interface in dom0 and the eth0 interface in the > guest. To find this out, I started a ping flood from one domU that was > pinging another domU. About every 10 minutes, there will be a lot of > ping requests going out but no replies coming back. I think it really > weird that it happens like ever 10 minutes plus about 2 seconds. While > the ping was going, I did tcpdumps from the domU starting the ping, > from the vif#.0 of the pinging machine, from the virtual bridge, from > the vif#.0 for the receiving guest, and then from the receiving domU. > The packets are making it all the way to the dom0 vif for the > receiving guest but not making it to the eth0 in the guest. I have no > clue why this is happening and it happens in rather regular intervals. > The same thing happens in pinging a different guest and it happens in > about the same interval but at different times. Also, during the ping > flood, there is never a pause in the sending of packets out of the > guest. Only a pause on the packets going from the host to the guest. > > I am running this on 64 bit Debian Lenny using the distribution''s > packages. xen-hypervisor-3.2-1-amd64 version 3.2.1-2 and > linux-image-2.6.26-2-xen-amd64 version 2.6.26-17. Here are the > networking configs. > > --------- > dom0# cat /etc/network/interfaces > # The loopback network interface > auto lo > iface lo inet loopback > > # The primary network interface > auto eth0 > iface eth0 inet static > address 10.135.7.34 > netmask 255.255.255.224 > network 10.135.7.32 > broadcast 10.135.7.63 > gateway 10.135.7.33 > # dns-* options are implemented by the resolvconf package, if > installed > dns-nameservers 10.135.7.34 > dns-search qa1.mozyops.com > > auto vmnet > iface vmnet inet static > address 10.135.2.71 > netmask 255.255.255.224 > bridge_ports eth1 > # bridge_stp off > # bridge_fd 9 > # bridge_hello 2 > # bridge_maxage 12 > > --------- > DomU# cat /etc/network/interfaces > # The loopback network interface > auto lo > iface lo inet loopback > > # The primary network interface > auto eth0 > iface eth0 inet dhcp > post-up ethtool -K eth0 tx off > > --------- > Dom0# brctl show vmnet > bridge name bridge id STP enabled interfaces > vmnet 8000.003048c8166d no eth1 > vif1.0 > vif10.0 > <other interfaces> > --------- > > Does anyone have any ideas as to what is going on here? Or more > importantly, any ideas on how to solve this? I have tried building a > newer domU kernel from scratch but I haven''t been able to make any > progress there. The guest fails to boot without showing anything on > the console. It then goes into this loop of trying to reboot the guest > but failing. I would really like to stay with the debian kernels. > > I have been banging my head against a wall for a week or so on this > and desperately need some help to get this working. I have engineers > that are getting held up by this bug. >This problem still exists. I tried setting an independent wallclock on all of the virtual machines. I also managed to miss that I had the wrong netmask configured for the vmnet bridge. It should have been 255.255.255.128. The vms were able to talk to each other before changing the netmask and I saw traffic flowing past the switch. Does anyone have any clue as to what might be going on? I am great need of some help here. Thanks mike _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jul 24, 2009 at 8:37 AM, Mike Lovell<mike@dev-zero.net> wrote:> I tried setting an independent wallclock on all of the virtual machines. I > also managed to miss that I had the wrong netmask configured for the vmnet > bridge. It should have been 255.255.255.128. The vms were able to talk to > each other before changing the netmask and I saw traffic flowing past the > switch. > > Does anyone have any clue as to what might be going on? I am great need of > some help here.How well maintained is Debian''s 2.6.26 xen kernel? xen.org''s 2.6.18 kernel might work better for you. Or go the other way around, using latest pv_ops kernel. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha wrote:> On Fri, Jul 24, 2009 at 8:37 AM, Mike Lovell<mike@dev-zero.net> wrote: > >> I tried setting an independent wallclock on all of the virtual machines. I >> also managed to miss that I had the wrong netmask configured for the vmnet >> bridge. It should have been 255.255.255.128. The vms were able to talk to >> each other before changing the netmask and I saw traffic flowing past the >> switch. >> >> Does anyone have any clue as to what might be going on? I am great need of >> some help here. >> > > How well maintained is Debian''s 2.6.26 xen kernel? xen.org''s 2.6.18 > kernel might work better for you. Or go the other way around, using > latest pv_ops kernel. >The Debian 2.6.26 xen kernel is one where they used the OpenSuse patches. I checked the changelog and it looks like the bulk of it was added in October ''08 with a few minor changes since. I don''t know how recent the patches that were applied at that time were. I would like to avoid using anything less than 2.6.26 because the boxes have hard drive controllers that use the sata_mv driver. This driver was experimental until about 2.6.26 and had issues with things like hot-swap and error handling before .26. Although, I might need to make that sacrifice if I am going to get this to work. I have also tried getting a pv_ops kernel to work before. If I remember correctly, I got a domU to boot with a pv_ops kernel but, so far, my efforts to do it with dom0 have been epic fail. It will either die as soon as Xen passes control to the dom0 kernel or soon there after. I usually just get frustrated and leave it for a while. So I don''t really have particulars on the pv_ops dom0 failures. Any pointers? Will a pv_ops dom0 work on a xen 3.2? or do I need something higher like 3.4 or unstable? I think I would rather try this route than going back to 2.6.18. My biggest question though is why would traffic not be getting passed from the dom0 vif interface to the domU eth0 interface? This is the problem I am seeing and it seems to happen on somewhat regular intervals. Is this something you have seen or heard of before? I don''t have specifics other than I know the traffic isn''t getting passed cause I see the packets on the vif interface but not the guest network interface during tcp dumps on both. Thanks for taking some time to help me. mike _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Jul 24, 2009 at 1:55 PM, Mike Lovell<mike@dev-zero.net> wrote:>> How well maintained is Debian''s 2.6.26 xen kernel? xen.org''s 2.6.18 >> kernel might work better for you. Or go the other way around, using >> latest pv_ops kernel. > > > The Debian 2.6.26 xen kernel is one where they used the OpenSuse patches. I > checked the changelog and it looks like the bulk of it was added in October > ''08 with a few minor changes since. I don''t know how recent the patches that > were applied at that time were.By "how well maintained" I meant "does someone actually fix them if there''s a known bug"? I know that Redhat fixes them (or at least try to backport fixes), which is why I use RHEL for production servers :D> Any pointers? Will a pv_ops dom0 work on a xen > 3.2? or do I need something higher like 3.4 or unstable? I think I would > rather try this route than going back to 2.6.18.Try this http://wiki.xensource.com/xenwiki/XenDom0Kernels> > My biggest question though is why would traffic not be getting passed from > the dom0 vif interface to the domU eth0 interface? This is the problem I am > seeing and it seems to happen on somewhat regular intervals. Is this > something you have seen or heard of before? I don''t have specifics other > than I know the traffic isn''t getting passed cause I see the packets on the > vif interface but not the guest network interface during tcp dumps on both.I haven''t had this problem with 2.6.18. It''s either "all works well" or traffic not going through at all (which was the case when there was a bug in the NIC driver). There were also "eth0: received packet with own address as source address", but nothing similar to the problem that you mentioned. Something to check though, are there any MAC-related message on syslog? Is it possible that perhaps the MAC you''re using is not unique? -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users