Dominic Hargreaves
2006-Dec-04 15:57 UTC
[Xen-users] domUs dropping off the network (Xen 3.0.2)
Hello, We''ve recently started seeing several domUs on multiple machines losing all network connectivity. At least one of these machines (since they are customer machines I can''t verify it for all of them) give the error: $ sudo ifup eth0 eth0: full queue wasn''t stopped! About the only Google hit for this message is: http://lists.xensource.com/archives/html/xen-devel/2004-09/msg00018.html which isn''t especially reassuring, but unfortunately, because they are production machines, I''m not in a position to do a lot of debugging. The problem appears to be time-depedent. It''s manifested on several domUs on several host machines (all the hosts were rebooted within an hour or so of each other, around 130 days ago, and the first instance of this problem was around three weeks ago). The vif needs to be renamed for the domU to regain network connectivity and a zombie domU appear (presumably these points are related). Other data points: - Debian sarge with backports.org packages of Xen 3.0.2 - Custom 2.6.16.27 kernel with Xen patches from Debian: http://svn.debian.org/wsvn/pkg-xen/trunk/patches/linux-2.6.16-xen.patch.gz?op=log&rev=0&sc=0&isdir=0 - Host network driver is e1000 throughout. I will probably try to upgrade these machines to 3.0.3 as I''ve had some anecdotal evidence that this has fixed a similar problem for someone else, but I thought I''d post my experiences here too in case this rings a bell with anyone. Cheers, Dominic. -- Dominic Hargreaves | http://www.larted.org.uk/~dom/ PGP key 5178E2A5 from the.earth.li (keyserver,web,email) _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Dominic Hargreaves
2006-Dec-08 10:48 UTC
4GB byte count overflow hangs networking (was Re: [Xen-users] domUs dropping off the network (Xen 3.0.2))
On Mon, Dec 04, 2006 at 03:57:10PM +0000, Dominic Hargreaves wrote:> Hello, > > We''ve recently started seeing several domUs on multiple machines losing > all network connectivity. At least one of these machines (since they are > customer machines I can''t verify it for all of them) give the error: > > $ sudo ifup eth0 > eth0: full queue wasn''t stopped!To follow up on this post: the problem happens when the number of bytes received by the vif on the dom0 side (and therefore transmitted by the domU) hits 4GB (or within a few kB thereof). In tests I have verified this against a Xen 3.0.2 system (although it doesn''t always manifest, so perhaps there are other factors in place too), and have also verified that with Xen 3.0.3 it seems to happen less often. Cheers, Dominic. -- Dominic Hargreaves | http://www.larted.org.uk/~dom/ PGP key 5178E2A5 from the.earth.li (keyserver,web,email) _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Steven Yelton
2007-Feb-09 15:34 UTC
Re: 4GB byte count overflow hangs networking (was Re: [Xen-users] domUs dropping off the network (Xen 3.0.2))
Dominic, We are seeing this as well. We were going to upgrade to 3.0.3 this weekend, but with what you are seeing this won''t fix it (we are currently using xen-3.0.2-2). Here are some other details on our hardware and configuration: xm top: NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) SSID d----- 79276 0.0 8 0.0 393216 6.3 1 1 4194303 1973846 0 d----- 5462 0.0 8 0.0 1048576 16.7 2 1 4194303 727438 0 d----- 4653 0.0 8 0.0 1048576 16.7 1 1 4194303 644714 0 d----- 4843 0.0 8 0.0 1048576 16.7 1 1 4194302 675675 0 d----- 4244 0.0 8 0.0 1048576 16.7 1 1 4194302 638509 0 d----- 4886 0.0 8 0.0 786432 12.5 1 1 4194303 691506 0 Domain-0 -----r 106893 0.5 262272 4.2 no limit n/a 4 8 2991003 3711014 0 domU1 --b--- 1774 1.7 392788 6.2 393216 6.3 1 2 2433841 1190900 0 domU2 --b--- 220877 0.1 392780 6.2 393216 6.3 1 2 4208965 3047409 0 domU3 --b--- 18591 0.0 392840 6.2 393216 6.3 1 2 5759556 545256 0 domU4 --b--- 2060 0.1 786012 12.5 786432 12.5 1 2 2579388 2731866 0 As you can see, domU3 is happily above 4G. We see domU4 crash repeatedly, and domU1 has crashed once. I wonder if it crashes once it will continue to do so. We are running on a dell 2850 with PAE enabled. The machine has been up for around 280 days. We didn''t start seeing the problem until it had been up about 240 days. We were hoping an upgrade and/or a reboot would get us back to a good state. Is there any other information I can give that can help track down this issue? Has anyone found *any* workaround? Thanks, Steven Dominic Hargreaves wrote:> On Mon, Dec 04, 2006 at 03:57:10PM +0000, Dominic Hargreaves wrote: > >> Hello, >> >> We''ve recently started seeing several domUs on multiple machines losing >> all network connectivity. At least one of these machines (since they are >> customer machines I can''t verify it for all of them) give the error: >> >> $ sudo ifup eth0 >> eth0: full queue wasn''t stopped! >> > > To follow up on this post: the problem happens when the number of bytes > received by the vif on the dom0 side (and therefore transmitted by the > domU) hits 4GB (or within a few kB thereof). > > In tests I have verified this against a Xen 3.0.2 system (although it > doesn''t always manifest, so perhaps there are other factors in place > too), and have also verified that with Xen 3.0.3 it seems to happen less > often. > > Cheers, > > Dominic. > >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users