Frank Schubert
2007-Mar-19 08:58 UTC
[Xen-users] Status of "4GB byte count overflow hangs networking"
Hi all, I''m referring to an email to xen-users on Fri, 8 Dec 2006 where Dominic Hargreaves reports about tests with version 3.0.2 and 3.0.3. He reported that with both version the 4GB byte count hang occurs. Here is another site where S. Burke reported the same problem with version 3.0.2-3: http://wiki.kartbuilding.net/index.php/Ongoing_Experiences_with_Xen#16th_Jan_2007_Xen_Report Currently I''m seeing the same problem with the official Xen 3.0.2-2 PAE kernel. dom0 uptime is > 300days The first domU hang occured at dom0 uptime between 280-290days. Days 0-280 with the same domUs load and traffic without any error. xentop: NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) SSID ds---- 30865 0.0 8 0.0 131072 3.1 1 1 4194303 1221445 0 d-b--- 4303 0.0 8 0.0 524288 12.5 1 1 4194303 1349417 0 ds---- 3195 0.0 8 0.0 524288 12.5 1 1 4194303 1226187 0 ds---- 5606 0.0 8 0.0 524288 12.5 1 1 4194303 577034 0 ds---- 8948 0.0 8 0.0 524288 12.5 1 1 4194303 557577 0 Domain-0 -----r 103478 0.3 257064 6.1 no limit n/a 2 8 3560993 1609201 0 vm01 --b--- 157894 0.1 1048360 25.0 1048576 25.0 1 1 401137 499516 0 vm02 --b--- 4951 0.2 524136 12.5 524288 12.5 1 1 2583543 266227 0 vm03 --b--- 723872 0.1 1048428 25.0 1048576 25.0 1 1 73381 2813667 0 vm04 --b--- 114504 0.2 524068 12.5 524288 12.5 1 1 3004136 1557914 0 vm05 --b--- 2023 0.0 130896 3.1 131072 3.1 1 1 555819 123315 0 The first 5 are domUs in zombie-state. As you can see all hang at NETTX(k) 4194303. When a guest hangs: xm console <domu> is possible, there i can shutdown the guest. After that i can "xm create </etc/xen/domU>" the guest. Now everything runs fine until the NETTX counter hits 4194303. Is this fixed in 3.0.4 or is there a workaround? Thanks in advance! Frank Schubert -- Frank Schubert Systemadministration EsPresto AG Breite Str. 30-31 10178 Berlin/Germany Tel: +49.(0)30.90 226.750 Fax: +49.(0)30.90 226.760 f.schubert@espresto.com HRB 77554 AG - Berlin-Charlottenburg Vorstand: Alexander Biersack und Maya Biersack Vorsitzender des Aufsichtsrats: Oli Kai Paulus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Good morning, After reading through "Re: [Xen-users] Status of \"4GB byte count overflow hangs networking\"", I was reminded of a problem we''ve been experiencing with one of our Xen machines here. Using the pci hiding mechanism, we''ve given a PCI-E Koutech PEN120 gig-E card to our NFS domU. Currently not running an active vif interface. The Koutech appears as eth1 on the domU. Assigned an IP, and it is off and running. Everything seems to work just fine.. then after a while (sometimes we trigger it by exceptionally heavy traffic and usage-- such as continually rebooting many nodes in our cluster; although sometimes it''ll go on for weeks without issue) network functionality seems to cease entirely. The posts I''ve followed seem to be referring to a 4GB hangup on the VIF interface. So even if it isn''t directly related to the problems we''re experiencing, it is the closest mention I''ve seen. xentop shows a 0 for NETTX for the NFS domU (which I guess would make sense if it is monitoring only vif0, and it isn''t active). We''ve been able to avoid a reboot pretty much every time by disconnecting the network cable from the card, waiting for it to recognize the link is gone, and plugging it back in. Everything comes back as if nothing was wrong. The domU recognizes that the cable was unplugged and when link is re-established, but doesn''t seem to have any knowledge that it had gone stupid. When it is stupid, nothing can access it, and I believe xm console''ing into the domU, the network is also inaccessible from it. Could this in any way be related to the problems others are experiencing? Some sort of 4GB thing? I''ll be keeping a look out and running some tests in the near future to see if this may also be the case in this situation. Currently running a 2.6.18 xen-unstable PAE setup on a Core 2 Duo box w/4GB of RAM. Debian Etch is the distribution. -Matthew -- Matthew Haas Visiting Instructor Corning Community College Computer & Information Science http://lab46.corning-cc.edu/haas/home/ "Writing should be like breathing; It is one of those important things we do." -- me _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Frank Schubert
2007-Mar-21 08:06 UTC
Re: [Xen-users] Status of "4GB byte count overflow hangs networking"
Hi all, I contacted Stephen Burke in private mail about his article and he has some experiences I want to share with the list: Stephen said he had not managed to fix the problem with the same Xen/Kernel-Version, but mentioned that upgrading helped for himself. (However no one else has verified that upgrading helps so far.) One thing I think is important to mention is that he used Xen 3.0.2 from Debian Backports. A colleague who was using 3.0.2 self-compiled did not see the 4GB hang problems. I use the official Xen 3.0.2-2 PAE binary version and have the 4GB hang problems. Today I will upgrade to 3.0.4. As the problem seem to be linked up with the systems uptime I will report back in ~280days ;-) Best Regards, Frank -- Frank Schubert Systemadministration EsPresto AG Breite Str. 30-31 10178 Berlin/Germany Tel: +49.(0)30.90 226.750 Fax: +49.(0)30.90 226.760 f.schubert@espresto.com HRB 77554 AG - Berlin-Charlottenburg Vorstand: Alexander Biersack und Maya Biersack Vorsitzender des Aufsichtsrats: Oli Kai Paulus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users