Hey everyone, A few weeks back there was some discussion on Xen''s I/O performance in clusters on the list. I did some experiments today myself using iperf (not ttcp): o Xen dom0 talking to another machine in the cluster running native linux: b/w around 904Mbps, thats nice :) o Xen VM (running on the same machine as the dom0 in the previous experiment) talking to another machine running native linux (again, same as in previous experiment) only achieves 128 Mbps I read on the list that you folks at Cambridge got upto 800+ Mbps across VMs? Did you guys do any special optimizations or set any special parameters? I read something about socket buffer size? Thanks, -- Diwaker Gupta http://resolute.ucsd.edu/diwaker ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Hey everyone, > > A few weeks back there was some discussion on Xen''s I/O performance in > clusters on the list. I did some experiments today myself using iperf > (not ttcp): > > o Xen dom0 talking to another machine in the cluster running native > linux: b/w around 904Mbps, thats nice :) > > o Xen VM (running on the same machine as the dom0 in the previous > experiment) talking to another machine running native linux (again, > same as in previous experiment) only achieves 128 Mbps > > I read on the list that you folks at Cambridge got upto 800+ Mbps > across VMs? Did you guys do any special optimizations or set any > special parameters? I read something about socket buffer size?We did our measurements with a 128KB socket buffer on a 3.0GHz dual Xeon. It does make a difference as to whether dom0 and domU are sharing the same CPU, on different CPUs, or on different hyperthreads of the same package. At least in our experiments, with an MTU of 1500 bytes things were pretty good regardless of the CPU allocation, but with an artificially reduced MTU of 552 bytes we were definitely seeing the advantage of having two CPUs. There have been a couple of reports on the list of people seeing unexpectedly low numbers, so something odd must be going on on some systems. Please can you give more information about your setup. When doing the experiments it would be interesting to know the number of interrupts per second reported by the dom0 and domU in each configuration. I guess the best approach to solving this is probably to add more instrumentation to the netfront/back drivers and export the data via a /proc/interface. It''s possible that we''re getting into a situation whereby the pipelining is breaking down and we''re only transferring a couple of packets per context switch. Ian Here''s the actual data we recorded: MTU 1500 MTU 552 TX RX TX RX Linux SMP 897 897 808 808 dom0 897 898 718 769 domU UP 897 843 436 379 domU HT 897 897 651 577 domU SMP 897 897 778 663 (VMWare) 291 651 101 137 (UML) 165 203 61 91 ------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > Hey everyone, > > > > A few weeks back there was some discussion on Xen''s I/O performance in > > clusters on the list. I did some experiments today myself using iperf > > (not ttcp): > > > > o Xen dom0 talking to another machine in the cluster running native > > linux: b/w around 904Mbps, thats nice :) > > > > o Xen VM (running on the same machine as the dom0 in the previous > > experiment) talking to another machine running native linux (again, > > same as in previous experiment) only achieves 128 Mbps > > > > I read on the list that you folks at Cambridge got upto 800+ Mbps > > across VMs? Did you guys do any special optimizations or set any > > special parameters? I read something about socket buffer size?One thing you might want to try is to change a line in the file linux-2.6.9-xenU/drivers/xen/netfront/netfront.c. From: #define RX_MIN_TARGET 8 To: #define RX_MIN_TARGET NETIF_RX_RING_SIZE One possibility is that dynamic buffer sizing is dropping some packets and causing TCP to crap itself. If this improves things then I''ll have to be much more careful about shrinking the buffers, and/or add a config option to disable the resizing completely. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I changed RX_MIN_TARGET in linux-2.4.27-xenU/arch/xen/drivers/netif/frontend/main.c and it made no difference at all in the 1500 byte iperf test. " the number of interrupts per second reported by the dom0 and domU " in each configuration. For iperf tcp stream sent from stock linux to xenU (2.4.27-xenU+RX_MIN_TARGET mod) 1500 MTU: 464 Mbps interrupts per second seen on xen0: irq 1: 0 Phys-irq keyboard irq129: 0 Dynamic-irq ctrl-if irq 14: 0 Phys-irq ide0 irq130: 7473 Dynamic-irq timer irq 17: 28526 Phys-irq eth0 irq131: 0 Dynamic-irq timer_d irq 18: 15 Phys-irq aic7xxx irq132: 0 Dynamic-irq console irq 19: 0 Phys-irq aic7xxx irq133: 0 Dynamic-irq blkif-b irq128: 0 Dynamic-irq misdire irq134: 1464 Dynamic-irq vif3.0 interrupts per second seen on xenU: irq128: 0 Dynamic-irq misdire irq131: 0 Dynamic-irq timer_d irq129: 0 Dynamic-irq ctrl-if irq132: 0 Dynamic-irq blkif irq130: 5273 Dynamic-irq timer irq133: 5121 Dynamic-irq eth0 552 MTU: 230 Mbps interrupts per second seen on xen0: irq 1: 0 Phys-irq keyboard irq129: 0 Dynamic-irq ctrl-if irq 14: 0 Phys-irq ide0 irq130: 9103 Dynamic-irq timer irq 17: 19227 Phys-irq eth0 irq131: 0 Dynamic-irq timer_d irq 18: 10 Phys-irq aic7xxx irq132: 0 Dynamic-irq console irq 19: 0 Phys-irq aic7xxx irq133: 0 Dynamic-irq blkif-b irq128: 0 Dynamic-irq misdire irq134: 1804 Dynamic-irq vif3.0 interrupts per second seen on xenU: irq128: 0 Dynamic-irq misdire irq131: 0 Dynamic-irq timer_d irq129: 0 Dynamic-irq ctrl-if irq132: 0 Dynamic-irq blkif irq130: 7264 Dynamic-irq timer irq133: 7158 Dynamic-irq eth0 The e1000 driver is stock, so these are with the default interrupt coalescing settings. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> One thing you might want to try is to change a line in the file > linux-2.6.9-xenU/drivers/xen/netfront/netfront.c. > From: > #define RX_MIN_TARGET 8 > To: > #define RX_MIN_TARGET NETIF_RX_RING_SIZE > > One possibility is that dynamic buffer sizing is dropping some packets > and causing TCP to crap itself. > > If this improves things then I''ll have to be much more careful about > shrinking the buffers, and/or add a config option to disable the > resizing completely. > > -- Keir >My bad. Sorry about the false alarm everyone, it was a routing issue. With the correct routing setup, I can see upto 930 Mbps between VMs running on 2 distinct physical machines in the cluster. Cheers! -- Diwaker Gupta http://resolute.ucsd.edu/diwaker ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Sorry about the false alarm everyone, it was a routing issue. With the > correct routing setup, I can see upto 930 Mbps between VMs running on > 2 distinct physical machines in the cluster.Phew! Is anyone else still seeing network performance anomalies? I know that some people were seeing odd _dom0_ network performance, but I suspect that''s an IOAPIC issue that will go away when Xen''s boot code gets restructured over the next few weeks. (see the new Roadmap web page). Ian ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
" Is anyone else still seeing network performance anomalies? I still cannot get over 500Mbps into xenU with any hardware that I have. This holds using kernels I build, or using kernel binaries from the 2.0.1 tarball. I tried tuning a e1000 driver, which greatly reduced the interrupt count, but had no effect on bandwidth. I can get 600 to 750 Mbps into domain-0 from a stock linux host. That rate then drops to around 500 after starting the etherbridge. Running top on Domain-0 claims the domain is over 60% idle. This is with e1000 and bcm5703 NICs on IBM x335, Dell 1650 and other platforms, and a variety of CPU clock speeds. Running iperf under stock linux 2.4.25 gets 940Mbps between any of them. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > " Is anyone else still seeing network performance anomalies? > > I still cannot get over 500Mbps into xenU with any hardware that I have. > This holds using kernels I build, or using kernel binaries from > the 2.0.1 tarball. > > I tried tuning a e1000 driver, which greatly reduced the interrupt count, > but had no effect on bandwidth. > > I can get 600 to 750 Mbps into domain-0 from a stock linux host. > That rate then drops to around 500 after starting the etherbridge. > Running top on Domain-0 claims the domain is over 60% idle.With the domain 0 otherwise idle, what happens if you run ''slurp'' (attached). The only time I''ve ever seen the bridge burn CPU is if you try setting some of its delay parameters to zero in which case it can cause it to periodically loop.> This is with e1000 and bcm5703 NICs on IBM x335, Dell 1650 and > other platforms, and a variety of CPU clock speeds. > Running iperf under stock linux 2.4.25 gets 940Mbps between any of them.What''s the spec of the most modern machines you''ve tried Xen on? Ian /****************************************************************************** * slurp.c * * Slurps spare CPU cycles and prints a percentage estimate every second. */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <err.h> /* rpcc: get full 64-bit Pentium TSC value */ static __inline__ unsigned long long int rpcc(void) { unsigned int __h, __l; __asm__ __volatile__ ("rdtsc" :"=a" (__l), "=d" (__h)); return (((unsigned long long)__h) << 32) + __l; } /* * find_cpu_speed: * Interrogates /proc/cpuinfo for the processor clock speed. * * Returns: speed of processor in MHz, rounded down to nearest whole MHz. */ #define MAX_LINE_LEN 50 int find_cpu_speed(void) { FILE *f; char s[MAX_LINE_LEN], *a, *b; if ( (f = fopen("/proc/cpuinfo", "r")) == NULL ) goto out; while ( fgets(s, MAX_LINE_LEN, f) ) { if ( strstr(s, "cpu MHz") ) { /* Find the start of the speed value, and stop at the dec point. */ if ( !(a=strpbrk(s,"0123456789")) || !(b=strpbrk(a,".")) ) break; *b = ''\0''; fclose(f); return(atoi(a)); } } out: fprintf(stderr, "find_cpu_speed: error parsing /proc/cpuinfo for cpu MHz"); exit(1); } int main( int argc, char **argv ) { int mhz, i, cpu=-1; /* * no_preempt_estimate is our estimate, in clock cycles, of how long it * takes to execute one iteration of the main loop when we aren''t * preempted. 50000 cycles is an overestimate, which we want because: * (a) On the first pass through the loop, diff will be almost 0, * which will knock the estimate down to <40000 immediately. * (b) It''s safer to approach real value from above than from below -- * note that this algorithm is unstable if n_p_e gets too small! */ unsigned int no_preempt_estimate = 50000; /* * prev = timestamp on previous iteration; * this = timestamp on this iteration; * diff = difference between the above two stamps; * start = timestamp when we last printed CPU % estimate; */ unsigned long long int prev, this, diff, start; /* * preempt_time = approx. cycles we''ve been preempted for since last stats * display. */ unsigned long long int preempt_time = 0; if ( argc > 1 ) cpu = atoi(argv[1]); else if ( argc > 2 ) exit(-1); /* Required in order to print intermediate results at fixed period. */ mhz = find_cpu_speed(); printf("CPU speed = %d MHz, using cpu %d\n", mhz, cpu); if (cpu>=0) { int rc; unsigned long bs = 0; bs = 1<<cpu; rc=sched_setaffinity( getpid(), sizeof(bs)*8, &bs ); if(rc) err(rc,"sched_getaffinity failed\n."); } start = prev = rpcc(); for ( ; ; ) { /* * By looping for a while here we hope to reduce affect of getting * preempted in critical "timestamp swapping" section of the loop. * In addition, it should ensure that ''no_preempt_estimate'' stays * reasonably large which helps keep this algorithm stable. */ for ( i = 0; i < 10000; i++ ); /* * The critical bit! Getting preempted here will shaft us a bit, * but the loop above should make this a rare occurrence. */ this = rpcc(); diff = this - prev; prev = this; /* if ( diff > (1.5 * preempt_estimate) */ if ( diff > no_preempt_estimate + (no_preempt_estimate>>1) ) { /* We were probably preempted for a while. */ preempt_time += diff - no_preempt_estimate; } else { /* * Looks like we weren''t preempted -- update our time estimate: * New estimate = 0.75*old_est + 0.25*curr_diff */ no_preempt_estimate (no_preempt_estimate>>1) + (no_preempt_estimate>>2) + (diff>>2); } /* Dump CPU time every second. */ if ( (this - start) / mhz > 1000000 ) { printf("Slurped %.2f%% CPU, cpu %d\n", 100.0*((this-start-preempt_time)/((double)this-start)), cpu); start = this; preempt_time = 0; } } return(0); } ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
" With the domain 0 otherwise idle, what happens if you run ''slurp'' " (attached). slurp output on domain 0. xen0 is hit with a 10 second iperf run in the middle. (xen0 is idle, there are no xenU hosted, but etherbridge and xend are running) Slurped 99.32% CPU, cpu -1 Slurped 99.32% CPU, cpu -1 Slurped 96.52% CPU, cpu -1 Slurped 31.97% CPU, cpu -1 Slurped 54.25% CPU, cpu -1 Slurped 42.67% CPU, cpu -1 Slurped 42.56% CPU, cpu -1 Slurped 49.14% CPU, cpu -1 Slurped 32.32% CPU, cpu -1 Slurped 37.31% CPU, cpu -1 Slurped 30.14% CPU, cpu -1 Slurped 52.59% CPU, cpu -1 Slurped 40.65% CPU, cpu -1 Slurped 99.22% CPU, cpu -1 Slurped 99.32% CPU, cpu -1 Slurped 99.29% CPU, cpu -1 " What''s the spec of the most modern machines you''ve tried Xen on? dual 2.8GHz P4xeon 2GB ram. Sending iperf from another box on that switch, I see 380Mbps into a xenU on a 2.8GHz host. Of course the slower CPUs are more available so most of my tests run there (2.0GHz P4 and 1.4GHz P3). Is the network overhead so high that it matters? They all get wire speed with stock linux. Another odd thing I saw is that, while on stock linux the ''timer'' irq is rock steady at 100 interrupts per second, under xen0 the timer irq varies from 60 to 200 when idle, and hits 5000 per second xen0 is receiving an iperf stream. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Another odd thing I saw is that, while on stock linux the ''timer'' irq is rock > steady at 100 interrupts per second, under xen0 the timer irq varies > from 60 to 200 when idle, and hits 5000 per second xen0 is receiving an > iperf stream.Domains don''t get tick interrupts when they aren''t running, so you can see tick rates lower than native. Also, a domain gets a tick interrupt every time it is rescheduled, which can happen at an arbitrarily fast rate which explains your very high tick rates in some cases. -- Keir ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I looked closely at tcpdumps of an iperf stream flowing into domain-0 from a stock linux box. It looks like my xen0 is not able to send out ACK packets while receiving incoming data packets. That is based on the first attached postscript graph. It was generated by tcptrace/xplot from a tcpdump taken on xen0 while iperf data for xen0 is arriving. The green line plots the time of the highest seen ACK seq number. The yellow line plots the seq number of the window limit over time. The black segments show time of incoming data (diamonds show TCP PUSH flag) The second postscript graph traces an iperf stream as seen from the sender side. The sender runs stock linux 2.4.25. The receiver runs 2.4.27-xen0. It shows that the sender fills the window as soon as green acks arrive, then waits quite a while for the next batch of acks to resume transmitting to xen0. Increasing the window size makes no difference as the sender keeps the window full (graphwise that means the yellow and green lines are farther apart, but the black segments reach the yellow line as soon as the acks arrive). Looking at iperf flows between stock linux boxes shows the sender never fills the window (the black segments rarely reach the yellow line). I''m guessing this means the handling of Rx interrupts completely blocks out progress on the Tx side. I tried throwing in a noapic kernel param but it made no difference.