Hi, we''re running Xen 2.0.5 (Kernel 2.6.11.4-21.8-xen) as installed by default on SuSE 9.3. The machine is a dual Athlon MP on a Tyan Thunder board with an Intel Pro/1000 NIC. The network between dom0 and the domUs uses the default bridged setup. Unfortunately, when transferring large amounts of data (e.g. large file transfers via NFS, or just pumping zeroes across the network with netcat), after 1 GB of data or so (a 3 GB file transfer definitely fails), the network suddenly stalls with only a few 100k/s of bandwidth remaining. This happens between dom0 and other machines, between domU and other machines, and between several domUs when running on different CPUs. Interestingly, things seem to be ok between domUs if they''re on the same CPU. Are there any ideas what one could do to fix this? Thanks in advance, Christoph -- -- Christoph Schmitz <schmitz@cs.uni-kassel.de> -- FG Wissensverarbeitung, FB 17, Universität Kassel -- http://www.kde.cs.uni-kassel.de/schmitz -- Tel. 0561-804-6254 -- Fax 0561-804-6259 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I am having a (possibly related?) problem with Xen 2.0.6 + 2.6.11.10. The machine is a dual athlon MP on a Tyan Tiger S2466-4M, Intel e1000. I am running Debian 3.1. A have been running the same configuration for months, and for a few minutes each day, Nagios would notice that my domU''s lost networking connectivity completely. It would never last more than a few minutes, so I did not pay much attention to it. During this time, the domU''s don''t even respond to arp requests. It does not matter what CPU they are on. dom0 was never affected. Last Friday the problem got very bad to the point where every few minutes, domU networking would go down, and stay down for longer periods of time. Rebooting did not fix anything. I switched from the e1000 to the onboard 3com 10/100 NIC, and this seemed to cure the problem for 5 days, but last night the domU networking started to bounce up and down again. There is no interesting dmesg output in dom0 or domU or Xen. I have not tried to ping from one domU to another. Usually these events are infrequent and happen while I am asleep. Now I am wondering if there is some SMP issue with Xen that only surfaces on the K7 architecture. Any insight would be appreciated. Regards, Jeff On Tue, Aug 30, 2005 at 06:31:16PM +0200, Christoph Schmitz wrote:> Hi, > > we''re running Xen 2.0.5 (Kernel 2.6.11.4-21.8-xen) as installed by default > on SuSE 9.3. The machine is a dual Athlon MP on a Tyan Thunder board with > an Intel Pro/1000 NIC. The network between dom0 and the domUs uses the > default bridged setup. > > Unfortunately, when transferring large amounts of data (e.g. large file > transfers via NFS, or just pumping zeroes across the network with netcat), > after 1 GB of data or so (a 3 GB file transfer definitely fails), the > network suddenly stalls with only a few 100k/s of bandwidth remaining. > > This happens between dom0 and other machines, between domU and other > machines, and between several domUs when running on different CPUs. > Interestingly, things seem to be ok between domUs if they''re on the same > CPU. > > Are there any ideas what one could do to fix this? > > Thanks in advance, > > Christoph_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Now I am wondering if there is some SMP issue with Xen that only surfaces > on the K7 architecture.No, these problems are universal or widespread, at the very least. See the archives, it''s something of a known problem with the network code. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
John Madden wrote:>>Now I am wondering if there is some SMP issue with Xen that only surfaces >>on the K7 architecture. > > > No, these problems are universal or widespread, at the very least. See the > archives, it''s something of a known problem with the network code. > > JohnCould you elaborate on which one this is? thanks, Nivedita _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
>> No, these problems are universal or widespread, at the very least. See the >> archives, it''s something of a known problem with the network code. >> > Could you elaborate on which one this is?In case you were asking me, this is the one where the network connection hangs for a few seconds, packets going to domU are lost, but packets to dom0 are just queued and eventually come through with high latency, both "outages" happening at the same time. Recent list threads "unstable network problem," "vif interfaces drop packets," and "intermittent network hangs" are pertinent. "DomU Bridged vs. Routed Networking?" also had some bits about it. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, 2005-08-30 at 16:49 -0500, John Madden wrote:> >> No, these problems are universal or widespread, at the very least. See the > >> archives, it''s something of a known problem with the network code. > >> > > Could you elaborate on which one this is? > > In case you were asking me, this is the one where the network connection hangs for > a few seconds, packets going to domU are lost, but packets to dom0 are just queued > and eventually come through with high latency, both "outages" happening at the > same time. > > Recent list threads "unstable network problem," "vif interfaces drop packets," and > "intermittent network hangs" are pertinent. "DomU Bridged vs. Routed Networking?" > also had some bits about it. > > John >This patch fixes FC4, have no ip''s on xm''s so minimal test. http://lists.xensource.com/archives/html/xen-devel/2005-08/msg00649.html Regards, Ted _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Recent list threads "unstable network problem," "vif > interfaces drop packets," and "intermittent network hangs" > are pertinent. "DomU Bridged vs. Routed Networking?" > also had some bits about it.If anyone can repeat any of this on -unstable then developers will take notice, particularly if you can provide a simple recipe to reproduce. There are a bunch of issues with respect to running services in domain 0 that cause problems for 2.0.x that are long since fixed in unstable. Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> There are a bunch of issues with respect to running services in domain 0 > that cause problems for 2.0.x that are long since fixed in unstable.What does "running services" mean? All I''m talking about is icmp... John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> This patch fixes FC4, have no ip''s on xm''s so minimal test. > > http://lists.xensource.com/archives/html/xen-devel/2005-08/msg00649.htmlLet me clarify: The problem doesn''t necessarily occur during xend start, in fact, what I''m seeing happens on and off for the life of the guest(s). If it helps, this is on 2.0-testing. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, 2005-08-31 at 09:08 -0500, John Madden wrote:> > This patch fixes FC4, have no ip''s on xm''s so minimal test. > > > > http://lists.xensource.com/archives/html/xen-devel/2005-08/msg00649.html > > Let me clarify: The problem doesn''t necessarily occur during xend start, in fact, > what I''m seeing happens on and off for the life of the guest(s). If it helps, > this is on 2.0-testing. > > John >I can make no sense at all of this, been experimenting with unstable and sometimes it doesn''t matter and sometimes it does. To clarify I am unable to get any network on any xm created domains, going to try a newer snap as soon as I see something interesting hit the dev list. Regards, Ted _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, 2005-08-31 at 10:55 -0400, Ted Kaczmarek wrote:> On Wed, 2005-08-31 at 09:08 -0500, John Madden wrote: > > > This patch fixes FC4, have no ip''s on xm''s so minimal test. > > > > > > http://lists.xensource.com/archives/html/xen-devel/2005-08/msg00649.html > > > > Let me clarify: The problem doesn''t necessarily occur during xend start, in fact, > > what I''m seeing happens on and off for the life of the guest(s). If it helps, > > this is on 2.0-testing. > > > > John > > > I can make no sense at all of this, been experimenting with unstable and > sometimes it doesn''t matter and sometimes it does. To clarify I am > unable to get any network on any xm created domains, going to try a > newer snap as soon as I see something interesting hit the dev list.I have networking on a guest domain on an FC4 machine. I am having problem with guest domains on SLES 9 machines, however. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=192> > Regards, > Ted > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >-- Regards, David F Barrera Linux Technology Center Systems and Technology Group, IBM "The wisest men follow their own direction. " Euripides _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
John Madden wrote:>>There are a bunch of issues with respect to running services in domain 0 >>that cause problems for 2.0.x that are long since fixed in unstable. > > > What does "running services" mean? All I''m talking about is icmp... > > JohnJohn, do any of the bugs in bugzilla cover your problem? The discussions in the various threads you mentioned in an earlier post covered a variety of problems, most fixed. The unstable code changes daily (so your issues are probably no longer the same). If you''re seeing the same problem currently in -testing, unstable and the prior releases, it would be helpful to file a bug with all the detail that you can provide. thanks, Nivedita _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Guess I should add my experiences: Running unstable (approx. two weeks old now.) Dom0 vif1.0 -> guest eth0: icmp works, ssh fails, tcp seems to have checksum issues. Building gre tunnels over this works around the problem. I thought maybe this was device driver related (tg3), so I tried bcm5700 with the same result. Tried a similar setup with a machine with e100 driver. This time I get udp checksum errors instead of tcp, so dns fails but ssh works ;-( A gre tunnel is my current workaround. I say checksum issues because ethereal is complaining about transport layer checksums when I do a capture to diagnose the problem. Tried various combinations of disabling tx/rx/sg/to with ethtool in both Dom0 and guest, to no avail. Also tried hacking the driver code to permanently disable offloads. Tried looking for the difference between stable/unstable netfront and netback, but didnt get far. A bridged setup works, ip address on Dom0 veth0, Dom0 vif0.0 bridged with Dom0 vif1.0. Real pain in the neck to firewall though, due to all the interfaces. Never managed to get Masquerading to work in this setup, so went back to a routed network with gre tunnels... I can try a more recent unstable if it might help, but I haven''t seen any evidence of anything having fixed this. Love to have a simple routed setup working. Tim:> _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Tim Durack wrote:> Guess I should add my experiences: > > Running unstable (approx. two weeks old now.) > > Dom0 vif1.0 -> guest eth0: icmp works, ssh fails, tcp seems to have checksum > issues. Building gre tunnels over this works around the problem. I thought > maybe this was device driver related (tg3), so I tried bcm5700 with the same > result.> Tried a similar setup with a machine with e100 driver. This time I get udp > checksum errors instead of tcp, so dns fails but ssh works ;-( A gre tunnel > is my current workaround. > > I say checksum issues because ethereal is complaining about transport layer > checksums when I do a capture to diagnose the problem. > > Tried various combinations of disabling tx/rx/sg/to with ethtool in both > Dom0 and guest, to no avail. Also tried hacking the driver code to > permanently disable offloads. Tried looking for the difference between > stable/unstable netfront and netback, but didnt get far. > > A bridged setup works, ip address on Dom0 veth0, Dom0 vif0.0 bridged with > Dom0 vif1.0. Real pain in the neck to firewall though, due to all the > interfaces. Never managed to get Masquerading to work in this setup, so went > back to a routed network with gre tunnels... > > I can try a more recent unstable if it might help, but I haven''t seen any > evidence of anything having fixed this. Love to have a simple routed setup > working.Tim, it''s very likely that your initial problem has been resolved in unstable - Ian, Keir, et. al have put in quite a few fixes which have closed most of the outstanding issues we had. I think there are only a few remaining issues. However, currently routing is broken in unstable, so you might want to hold off on testing current unstable until that gets fixed. thanks, Nivedita _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> John, do any of the bugs in bugzilla cover your problem? > The discussions in the various threads you mentioned in > an earlier post covered a variety of problems, most fixed. > The unstable code changes daily (so your issues are > probably no longer the same). If you''re seeing the same > problem currently in -testing, unstable and the prior > releases, it would be helpful to file a bug with all the > detail that you can provide.In the 39 bugs matching "network," I didn''t find anything, so I created bug #208 for it. John -- John Madden UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > > Tim, it''s very likely that your initial problem has been resolved > in unstable - Ian, Keir, et. al have put in quite a few fixes which > have closed most of the outstanding issues we had. I think there > are only a few remaining issues. However, currently routing is > broken in unstable, so you might want to hold off on testing current > unstable until that gets fixed.Okay, good to know. I hadn''t seen any acknowledgement that routing was broken. I''ll stick to my gre tunnels for now. Tim:> _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Do you mean the network scripts are broken, or the handling of interfaces in Dom0 is causing issues with a routed setup? I''m configuring everything manually. The fact that I have to build gre tunnels to make things work as expected suggests their is something broken in the netfront/netback code to me. Tim:> On 8/31/05, Nivedita Singhvi <niv@us.ibm.com> wrote:> > Tim Durack wrote: > > Guess I should add my experiences: > > > > Running unstable (approx. two weeks old now.) > > > > Dom0 vif1.0 -> guest eth0: icmp works, ssh fails, tcp seems to have > checksum > > issues. Building gre tunnels over this works around the problem. I > thought > > maybe this was device driver related (tg3), so I tried bcm5700 with the > same > > result. > > > Tried a similar setup with a machine with e100 driver. This time I get > udp > > checksum errors instead of tcp, so dns fails but ssh works ;-( A gre > tunnel > > is my current workaround. > > > > I say checksum issues because ethereal is complaining about transport > layer > > checksums when I do a capture to diagnose the problem. > > > > Tried various combinations of disabling tx/rx/sg/to with ethtool in both > > Dom0 and guest, to no avail. Also tried hacking the driver code to > > permanently disable offloads. Tried looking for the difference between > > stable/unstable netfront and netback, but didnt get far. > > > > A bridged setup works, ip address on Dom0 veth0, Dom0 vif0.0 bridged > with > > Dom0 vif1.0. Real pain in the neck to firewall though, due to all the > > interfaces. Never managed to get Masquerading to work in this setup, so > went > > back to a routed network with gre tunnels... > > > > I can try a more recent unstable if it might help, but I haven''t seen > any > > evidence of anything having fixed this. Love to have a simple routed > setup > > working. > > Tim, it''s very likely that your initial problem has been resolved > in unstable - Ian, Keir, et. al have put in quite a few fixes which > have closed most of the outstanding issues we had. I think there > are only a few remaining issues. However, currently routing is > broken in unstable, so you might want to hold off on testing current > unstable until that gets fixed. > > thanks, > Nivedita >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users