I set up a domU as a backup server, but it has very, very poor network performance with external computers. I ran some tests with iperf and found some very weird results. Using iperf, I get these approximate numbers (the left column is the iperf client and the right column is the iperf server): domU --> domU 1.77 Gbits/sec (using 127.0.0.1) domU --> domU 1.85 Gbits/sec (using domU eth0 IP address) dom0 --> domU 91.5 Mbits/sec domU --> dom0 85.2 Mbits/sec So far, so good. The relatively slow dom0<->domU communication may indicate a problem, but it''s fine for my purposes. The real problem is when I use my iBook (running Mac OS X) to run some iperf tests. The computers are connected via a crossover cable. They were originally connected with a hub, but I changed to a crossover cable connection to reduce variables (it turns out this had no effect). dom0 --> iBook 89.0 Mbits/sec iBook --> dom0 86.9 Mbits/sec domU --> iBook 87.1 Mbits/sec iBook --> domU 1.87 Mbits/sec The last entry has me baffled. Why would it be so incredibly slow in one direction but not the other? I decided to run some UDP tests as well. server: iperf -s -u -i 1 client: iperf -c server_ip -u -b 90M -t 5 The packet loss is as follows: domU --> domU 0% (using 127.0.0.1) domU --> domU 0% (using domU eth0 IP address) dom0 --> domU ~100% (only 7 of 38464 made it!) domU --> dom0 0.09% dom0 --> iBook 4.7% iBook --> dom0 0.33% domU --> iBook 11% iBook --> domU 1.6% There are some odd things here. First, dom0->domU with UDP loses almost everything, but the reverse direction is fine. Somehow, dom0- >domU TCP was OK (if you consider ~90Mbps OK). The second weird thing is that in contrast with TCP, UDP works fine in both directions between the iBook and domU. There''s 11% packet loss in one case, but that''s not a lot -- it''s probably just a little more than poor little iBook can handle. My dom0 is Fedora Core 5, with the included xen0 kernel. The domU is a very basic install of Centos 4.3, based on a jailtime.org image, running the xensource 2.6.12.6-xenU kernel. The domU has bridged networking and 64MB of RAM (I ran the iBook->domU TCP test with 196MB of RAM but it was still ~2Mb/s). Firewalling is off in domU and dom0; the only iptables rules are the ones created by the xen bridging script. Has anyone else seen anything like this, or have any idea what''s going on? This seems bizarre to me. Thanks for any help. --Winston _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Tue, Apr 04, 2006 at 03:26:14AM -0400, Winston Chang wrote:> I set up a domU as a backup server, but it has very, very poor > network performance with external computers. I ran some tests with > iperf and found some very weird results. >I also see (UDP) packet loss from external box to domU, and from domU to external box. TCP performance is poor because of this packet loss (TCP automatically re-transmits the broken packets - this causes slow tcp speeds). I haven''t tried latest unstable version of xen.. so I don''t know if it''s already fixed. - Pasi> Using iperf, I get these approximate numbers (the left column is the > iperf client and the right column is the iperf server): > domU --> domU 1.77 Gbits/sec (using 127.0.0.1) > domU --> domU 1.85 Gbits/sec (using domU eth0 IP address) > dom0 --> domU 91.5 Mbits/sec > domU --> dom0 85.2 Mbits/sec > > So far, so good. The relatively slow dom0<->domU communication may > indicate a problem, but it''s fine for my purposes. The real problem > is when I use my iBook (running Mac OS X) to run some iperf tests. > The computers are connected via a crossover cable. They were > originally connected with a hub, but I changed to a crossover cable > connection to reduce variables (it turns out this had no effect). > > dom0 --> iBook 89.0 Mbits/sec > iBook --> dom0 86.9 Mbits/sec > domU --> iBook 87.1 Mbits/sec > iBook --> domU 1.87 Mbits/sec > > The last entry has me baffled. Why would it be so incredibly slow in > one direction but not the other? > > > I decided to run some UDP tests as well. > server: iperf -s -u -i 1 > client: iperf -c server_ip -u -b 90M -t 5 > The packet loss is as follows: > domU --> domU 0% (using 127.0.0.1) > domU --> domU 0% (using domU eth0 IP address) > dom0 --> domU ~100% (only 7 of 38464 made it!) > domU --> dom0 0.09% > > dom0 --> iBook 4.7% > iBook --> dom0 0.33% > domU --> iBook 11% > iBook --> domU 1.6% > > There are some odd things here. First, dom0->domU with UDP loses > almost everything, but the reverse direction is fine. Somehow, dom0- > >domU TCP was OK (if you consider ~90Mbps OK). > The second weird thing is that in contrast with TCP, UDP works fine > in both directions between the iBook and domU. There''s 11% packet > loss in one case, but that''s not a lot -- it''s probably just a little > more than poor little iBook can handle. > > My dom0 is Fedora Core 5, with the included xen0 kernel. The domU is > a very basic install of Centos 4.3, based on a jailtime.org image, > running the xensource 2.6.12.6-xenU kernel. The domU has bridged > networking and 64MB of RAM (I ran the iBook->domU TCP test with 196MB > of RAM but it was still ~2Mb/s). Firewalling is off in domU and > dom0; the only iptables rules are the ones created by the xen > bridging script. > > > Has anyone else seen anything like this, or have any idea what''s > going on? This seems bizarre to me. > Thanks for any help. > --Winston >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephen C. Tweedie
2006-Apr-04 17:59 UTC
Re: [Xen-users] Very slow domU network performance
Hi, On Tue, 2006-04-04 at 03:26 -0400, Winston Chang wrote:> The packet loss is as follows: > domU --> domU 0% (using 127.0.0.1) > domU --> domU 0% (using domU eth0 IP address) > dom0 --> domU ~100% (only 7 of 38464 made it!)Yow. There have been a number of weird checksum problems identified in the past with Xen''s networking; a checkin was just made a day or two ago which cleans up the checksum handling in a way which may well help here. We''ll have to see whether an updated dom0/domU kernel improves things much. --Stephen _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Apr 4, 2006, at 4:51 AM, Pasi Kärkkäinen wrote:> > I also see (UDP) packet loss from external box to domU, and from > domU to external > box. TCP performance is poor because of this packet loss (TCP > automatically > re-transmits the broken packets - this causes slow tcp speeds). > > I haven''t tried latest unstable version of xen.. so I don''t know if > it''s > already fixed. > > - PasiMy problem seems to be different from yours -- UDP performance was fine from the external to domU, but TCP was bad. Also, UDP was broken from dom0 to domU (but fine in reverse), but TCP was OK. I just tested with xen-unstable from a week ago, and it had the same performance. I think the reason for the performance problem is domU CPU starvation. I found elsewhere that running ''xm sched-sedf 0 0 0 0 1 1'' will prevent domU from getting deprived of CPU when dom0 is active, so I ran it, and got the following (with the week-old xen- unstable kernel). The ''old'' column is what I had before. The ''new'' column is what I got after doing the scheduling change: ====TCP===./iperf -s ./iperf -c [server] ==========Transfer rate, in Mb/s Old New Source Dest Rate Rate dom0 domU 92 167 dom0 iBook 89 91 domU dom0 85 373 domU iBook 87 85 iBook dom0 87 92 iBook domU 1.9 92 ====UDP, 90Mb/s===./iperf -s -u -i 5 ./iperf -c [server] -u -b 90M -t 5 ==========Packet loss, in percent Old New Source Dest Loss Loss dom0 domU ~100 0.13 dom0 iBook 4.7 4 domU dom0 0.1 0 domU iBook 11 8 iBook dom0 0.3 0 iBook domU 1.6 0 All tests were done with iperf 1.7.0 (the new version 2.0.2, wouldn''t compile on my iBook). These numbers are much more reasonable. The big asymmetries are gone. dom0<-->domU TCP performance (170-370 Mb/s) is still significantly lower than domU<->domU performance (1.7Gb/s). This is fast enough for me, but does it indicate a problem with Xen? The machine is a 1.8GHz P4, so I wouldn''t think that the xen networking and bridging overhead would reduce performance by so much. Would it? At any rate, I''d be curious to see if anyone else sees the same slowness when networking, and if the scheduling change fixes it. iperf is very easy to compile and run... --Winston> >> Using iperf, I get these approximate numbers (the left column is the >> iperf client and the right column is the iperf server): >> domU --> domU 1.77 Gbits/sec (using 127.0.0.1) >> domU --> domU 1.85 Gbits/sec (using domU eth0 IP address) >> dom0 --> domU 91.5 Mbits/sec >> domU --> dom0 85.2 Mbits/sec >> >> So far, so good. The relatively slow dom0<->domU communication may >> indicate a problem, but it''s fine for my purposes. The real problem >> is when I use my iBook (running Mac OS X) to run some iperf tests. >> The computers are connected via a crossover cable. They were >> originally connected with a hub, but I changed to a crossover cable >> connection to reduce variables (it turns out this had no effect). >> >> dom0 --> iBook 89.0 Mbits/sec >> iBook --> dom0 86.9 Mbits/sec >> domU --> iBook 87.1 Mbits/sec >> iBook --> domU 1.87 Mbits/sec >> >> The last entry has me baffled. Why would it be so incredibly slow in >> one direction but not the other? >> >> >> I decided to run some UDP tests as well. >> server: iperf -s -u -i 1 >> client: iperf -c server_ip -u -b 90M -t 5 >> The packet loss is as follows: >> domU --> domU 0% (using 127.0.0.1) >> domU --> domU 0% (using domU eth0 IP address) >> dom0 --> domU ~100% (only 7 of 38464 made it!) >> domU --> dom0 0.09% >> >> dom0 --> iBook 4.7% >> iBook --> dom0 0.33% >> domU --> iBook 11% >> iBook --> domU 1.6% >> >> There are some odd things here. First, dom0->domU with UDP loses >> almost everything, but the reverse direction is fine. Somehow, dom0- >>> domU TCP was OK (if you consider ~90Mbps OK). >> The second weird thing is that in contrast with TCP, UDP works fine >> in both directions between the iBook and domU. There''s 11% packet >> loss in one case, but that''s not a lot -- it''s probably just a little >> more than poor little iBook can handle. >> >> My dom0 is Fedora Core 5, with the included xen0 kernel. The domU is >> a very basic install of Centos 4.3, based on a jailtime.org image, >> running the xensource 2.6.12.6-xenU kernel. The domU has bridged >> networking and 64MB of RAM (I ran the iBook->domU TCP test with 196MB >> of RAM but it was still ~2Mb/s). Firewalling is off in domU and >> dom0; the only iptables rules are the ones created by the xen >> bridging script. >> >> >> Has anyone else seen anything like this, or have any idea what''s >> going on? This seems bizarre to me. >> Thanks for any help. >> --Winston >>_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hello> There have been a number of weird checksum problems > identified in the past with Xen''s networking; a checkin was > just made a day or two ago which cleans up the checksum > handling in a way which may well help here. > We''ll have to see whether an updated dom0/domU kernel > improves things much.Is this checkin in todays xen-3.0-testing ? Regards, Steffen _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Apr 4, 2006, at 1:59 PM, Stephen C. Tweedie wrote:>> The packet loss is as follows: >> domU --> domU 0% (using 127.0.0.1) >> domU --> domU 0% (using domU eth0 IP address) >> dom0 --> domU ~100% (only 7 of 38464 made it!) > > Yow. > > There have been a number of weird checksum problems identified in the > past with Xen''s networking; a checkin was just made a day or two ago > which cleans up the checksum handling in a way which may well help > here. > We''ll have to see whether an updated dom0/domU kernel improves things > much.I ran the test with the latest xen-unstable build. The results are the same. When I ran ''xm sched-sedf 0 0 0 0 1 1'' to prevent domU CPU starvation, network performance was good. The numbers in this case are the same as in my other message where I detail the results using the week-old xen build -- it could handle 90Mb/s with no datagram loss. So it looks like the checksum patches had no effect on this phenomenon; the only thing that mattered was the scheduling. I also did some lower data-rate UDP tests with iperf (without the scheduling change). At 500 Kb/s it loses about 48% of the datagrams, at 2 Mb/s it loses 81%, and at 4 Mb/s it loses 99%. Ouch. iperf also manages to chew up 100% of CPU time doing this, so that might explain why domU chokes even at low bandwidths. Perhaps its timing is implemented with a while loop. There''s still the odd thing with domU<->dom0 communication being about 1/10 the speed of dom0<->dom0 or domU<->domU. It''s roughly 170 Mb/s versus 1.7 Gb/s. --Winston _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Winston Chang wrote:> On Apr 4, 2006, at 1:59 PM, Stephen C. Tweedie wrote: > >>> The packet loss is as follows: >>> domU --> domU 0% (using 127.0.0.1) >>> domU --> domU 0% (using domU eth0 IP address) >>> dom0 --> domU ~100% (only 7 of 38464 made it!) >> >> Yow. >> >> There have been a number of weird checksum problems identified in the >> past with Xen''s networking; a checkin was just made a day or two ago >> which cleans up the checksum handling in a way which may well help here. >> We''ll have to see whether an updated dom0/domU kernel improves things >> much. > > I ran the test with the latest xen-unstable build. The results are the > same. > > When I ran ''xm sched-sedf 0 0 0 0 1 1'' to prevent domU CPU starvation, > network performance was good. The numbers in this case are the same as > in my other message where I detail the results using the week-old xen > build -- it could handle 90Mb/s with no datagram loss. So it looks like > the checksum patches had no effect on this phenomenon; the only thing > that mattered was the scheduling. >What was the previous weight of domain 0? What is the weight assigned to the domU''s and do the domU''s have bursting enabled? Thank you, Matt Ayres _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
>> I ran the test with the latest xen-unstable build. The results >> are the same. >> When I ran ''xm sched-sedf 0 0 0 0 1 1'' to prevent domU CPU >> starvation, network performance was good. The numbers in this >> case are the same as in my other message where I detail the >> results using the week-old xen build -- it could handle 90Mb/s >> with no datagram loss. So it looks like the checksum patches had >> no effect on this phenomenon; the only thing that mattered was the >> scheduling. > > What was the previous weight of domain 0? What is the weight > assigned to the domU''s and do the domU''s have bursting enabled?I''m not really sure the answer to either of these questions. The weight is whatever is the default is with Fedora Core 5 and xen- unstable. I don''t know anything about bursting. How do you find out? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Matt Ayres
2006-Apr-05 17:11 UTC
[Xen-devel] Very slow domU network performance - Moved to xen-devel
Winston Chang wrote:>>> I ran the test with the latest xen-unstable build. The results are >>> the same. >>> When I ran ''xm sched-sedf 0 0 0 0 1 1'' to prevent domU CPU >>> starvation, network performance was good. The numbers in this case >>> are the same as in my other message where I detail the results using >>> the week-old xen build -- it could handle 90Mb/s with no datagram >>> loss. So it looks like the checksum patches had no effect on this >>> phenomenon; the only thing that mattered was the scheduling. >> >> What was the previous weight of domain 0? What is the weight assigned >> to the domU''s and do the domU''s have bursting enabled? > > I''m not really sure the answer to either of these questions. The weight > is whatever is the default is with Fedora Core 5 and xen-unstable. I > don''t know anything about bursting. How do you find out? >I''d like to be corrected if I am wrong, but the last number (weight) is set to 0 for all domains by default. By giving it a value of 1 you are giving dom0 more CPU. The second to last number is a boolean that decides whether a domain is hard locked to it''s weight or if can burst using idle CPU cycles. The 3 before that are generally set to 0 and the first number is the domain name. I do not know of a way to grab the weights personally. It is documented in the Xen distribution tgz. I ran my own tests. I have dom0 with a weight of 512 (double it''s memory allocation) and each VM also has a weight equal to it''s memory allocation. My dom0 can transfer at 10MB/s+ over the LAN, but domU''s with 100% CPU used on the host could only transfer over the LAN at a peak of 800KB/s. When I gave dom0 a weight of 1 domU transfers decreased to a peak of 100KB/s over the "LAN" (quoted because due to proxy ARP the host acts as a router) The problem does not matter if you use bridged or routed mode. I would have to believe the problem is in the hypervisor itself and scheduling and CPU usage greatly affect it. Network bandwidth should not be affected unless wanted (ie. by using the rate vif parameter). Stephen Soltesz has experienced the same problem and has some graphs to back it up. Stephen, will you share at least that one CPU + IPerf graph with the community and perhaps elaborate on your weight configuration (if any). Thank you, Matt Ayres _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Winston Chang
2006-Apr-06 06:40 UTC
[Xen-devel] Re: Very slow domU network performance - Moved to xen-devel
On Apr 5, 2006, at 1:11 PM, Matt Ayres wrote:> Winston Chang wrote: >>>> I ran the test with the latest xen-unstable build. The results >>>> are the same. >>>> When I ran ''xm sched-sedf 0 0 0 0 1 1'' to prevent domU CPU >>>> starvation, network performance was good. The numbers in this >>>> case are the same as in my other message where I detail the >>>> results using the week-old xen build -- it could handle 90Mb/s >>>> with no datagram loss. So it looks like the checksum patches >>>> had no effect on this phenomenon; the only thing that mattered >>>> was the scheduling. >>> >>> What was the previous weight of domain 0? What is the weight >>> assigned to the domU''s and do the domU''s have bursting enabled? >> I''m not really sure the answer to either of these questions. The >> weight is whatever is the default is with Fedora Core 5 and xen- >> unstable. I don''t know anything about bursting. How do you find out? > > I''d like to be corrected if I am wrong, but the last number > (weight) is set to 0 for all domains by default. By giving it a > value of 1 you are giving dom0 more CPU. The second to last number > is a boolean that decides whether a domain is hard locked to it''s > weight or if can burst using idle CPU cycles. The 3 before that > are generally set to 0 and the first number is the domain name. I > do not know of a way to grab the weights personally. It is > documented in the Xen distribution tgz.I can tell you the symptoms I had: whenever a process in dom0 grabs 100% of the CPU, the domU console freezes. After a little while, the domU console says "BUG: soft lockup detected on CPU#0!" So I believe that with my default settings, dom0 always gets first priority, and domU gets the leftovers. For those that have just seen this (this thread started on xen- users): I had very poor UDP performance using iperf with domU as the server and dom0 as the client. I had 99.98% packet loss when running at 90Mb/s in this case, until I changed the scheduling, as above. Then packet loss dropped to 0. In the reverse direction there was never a problem. For more details, see the original thread here: http://lists.xensource.com/archives/html/xen-users/2006-04/msg00096.html It''s possible that iperf is partially at fault here. (I used version 1.7.0 since 2.0.2 wouldn''t compile on my iBook.) I noticed that it takes 100% of CPU time when it''s used as a UDP client, even when running at lower speeds -- I saw this at 4Mb/s. I would wager that it uses a while loop to delay between sending datagrams. Since iperf always wants all the CPU cycles and because domU has last priority in my default scheduling config, domU just wouldn''t get enough CPU time to process the incoming datagrams. A more general note about using iperf: It seems to me that as long as iperf uses 100% of the CPU, it is not a good tool for testing dom0-domU or domU-domU network performance. This sort of timing loop would be fine for network tests using "real" hosts, but not ones in which CPU resources are shared and network I/O is CPU-bound, as is the case here. I would guess that this would not occur on SMP machines (and maybe hyperthreaded ones also), since iperf''s timing loop wouly use up only one CPU. The other network issue I had was very slow TCP performance when domU was the iperf server and an external machine was the iperf client. I had 2 Mb/s in this case, but about 90Mb/s in the other direction (on 100Mbit ethernet). This problem disappeared when I did the scheduling change above. This issue is _not_ explained by the iperf hogging the CPU as I mentioned above. No user-level process in dom0 should be involved; dom0 just does some low-level networking. But if the cause of this TCP problem is that dom0 is taking all the CPU resources, then that would suggest that somewhere in the xen networking/bridging code, it is getting 100% CPU time, just to do bridging for the incoming data. Does this indicate a problem in the networking code? Again, the TCP slowness does not occur in the reverse direction, when domU is sending to an external machine. My guess is that, like the iperf UDP issue above, that this problem would not occur on SMP machines. --Winston> I ran my own tests. I have dom0 with a weight of 512 (double it''s > memory allocation) and each VM also has a weight equal to it''s > memory allocation. My dom0 can transfer at 10MB/s+ over the LAN, > but domU''s with 100% CPU used on the host could only transfer over > the LAN at a peak of 800KB/s. When I gave dom0 a weight of 1 domU > transfers decreased to a peak of 100KB/s over the "LAN" (quoted > because due to proxy ARP the host acts as a router) > > The problem does not matter if you use bridged or routed mode. > > I would have to believe the problem is in the hypervisor itself and > scheduling and CPU usage greatly affect it. Network bandwidth > should not be affected unless wanted (ie. by using the rate vif > parameter). > > Stephen Soltesz has experienced the same problem and has some > graphs to back it up. Stephen, will you share at least that one > CPU + IPerf graph with the community and perhaps elaborate on your > weight configuration (if any). > > Thank you, > Matt Ayres_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel