Hi everyone, First, let me establish the baseline here: All default settings, no modifications to any sysctls. Same amount of RAM to dom0 and VM. (note that by default, TCP BiC is on). Test across a low latency cluster, everything on the same gigabit switch. I''m using Xen 2.0.3. I''m using netperf for all my test. Between dom0 to dom0 on two machines in the cluster, I can consistenly get ~930Mbps. Between VM to VM on the same two machines, I can get between 730 to 850 Mbps, but there''s a lot more variation. So far so good. Now, I modify the TCP buffer sizes (both on dom0 and VM) thus: net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_mem = 24576 32768 49152 net.core.rmem_default = 112640 net.core.wmem_default = 112640 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.ipv4.tcp_bic_low_window = 14 net.ipv4.tcp_bic_fast_convergence = 1 net.ipv4.tcp_bic = 1 Now, between dom0 to dom0 on 2 machines, I can get consistenly get 880Mbps. And between VM to VM, I can get around 850Mbps. So far so good. But now comes the really interesting part. So far, these machines were talking over the switch directly. Now I direct all traffic through a dummynet router (on the same switch). The pipe connecting the two is set to 500Mbps with an RTT of 80ms. Here are the results for dom0 to dom0 tests: == Single flow, 10 seconds =[dgupta@sysnet03]$ netperf -H sysnet08 TCP STREAM TEST to sysnet08 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 65536 65536 10.11 158.55 == Single flow, 80 seconds =[dgupta@sysnet03]$ netperf -H sysnet08 -l 80 TCP STREAM TEST to sysnet08 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 65536 65536 80.72 344.20 == 50 flows, 80 seconds = 87380 65536 65536 80.14 4.93 87380 65536 65536 80.18 9.37 87380 65536 65536 80.21 10.13 87380 65536 65536 80.22 9.11 87380 65536 65536 80.19 9.45 87380 65536 65536 80.22 5.06 87380 65536 65536 80.15 9.38 87380 65536 65536 80.20 9.98 87380 65536 65536 80.23 3.70 87380 65536 65536 80.20 9.14 87380 65536 65536 80.18 8.85 87380 65536 65536 80.16 8.96 87380 65536 65536 80.21 9.91 87380 65536 65536 80.18 9.46 87380 65536 65536 80.17 9.38 87380 65536 65536 80.18 9.82 87380 65536 65536 80.15 7.22 87380 65536 65536 80.16 8.64 87380 65536 65536 80.26 10.60 87380 65536 65536 80.22 9.33 87380 65536 65536 80.24 8.88 87380 65536 65536 80.22 9.54 87380 65536 65536 80.19 9.65 87380 65536 65536 80.20 9.70 87380 65536 65536 80.24 9.43 87380 65536 65536 80.19 8.10 87380 65536 65536 80.21 9.31 87380 65536 65536 80.18 9.08 87380 65536 65536 80.19 9.24 87380 65536 65536 80.27 9.91 87380 65536 65536 80.28 9.67 87380 65536 65536 80.24 9.50 87380 65536 65536 80.28 9.70 87380 65536 65536 80.24 10.09 87380 65536 65536 80.31 4.55 87380 65536 65536 80.28 5.93 87380 65536 65536 80.25 9.55 87380 65536 65536 80.32 5.60 87380 65536 65536 80.35 6.29 87380 65536 65536 80.27 4.75 87380 65536 65536 80.40 6.51 87380 65536 65536 80.39 6.38 87380 65536 65536 80.40 10.12 87380 65536 65536 80.53 4.62 87380 65536 65536 80.67 16.53 87380 65536 65536 81.10 4.53 87380 65536 65536 82.21 1.93 87380 65536 65536 80.09 9.43 87380 65536 65536 80.10 9.14 87380 65536 65536 80.13 9.88 [~] [dgupta@sysnet03]$ awk ''{sum+=$5} END {print sum,NR,sum/NR}'' dom0-dom0-50.dat 419.96 50 8.3992 This the aggregate and average per flow. Now I run the same test from VM to VM: == Single flow, 10 seconds =root@tg3:~# netperf -H 172.19.222.101 TCP STREAM TEST to 172.19.222.101 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 65536 65536 10.15 22.30 == Single flow, 80 seconds =root@tg3:~# netperf -H 172.19.222.101 -l 80 TCP STREAM TEST to 172.19.222.101 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 65536 65536 80.17 76.96 == 50 flows, 80 seconds = tee vm-vm-50.dati=0;i<50;i++));do netperf -P 0 -H 172.19.222.101 -l 80 & done | 87380 65536 65536 80.09 8.50 87380 65536 65536 80.08 6.46 87380 65536 65536 80.19 7.33 87380 65536 65536 80.20 7.29 87380 65536 65536 80.20 5.86 87380 65536 65536 80.23 8.40 87380 65536 65536 80.22 8.55 87380 65536 65536 80.22 7.34 87380 65536 65536 80.29 6.28 87380 65536 65536 80.28 7.23 87380 65536 65536 80.23 8.56 87380 65536 65536 80.25 6.60 87380 65536 65536 80.31 6.99 87380 65536 65536 80.27 8.22 87380 65536 65536 80.30 7.41 87380 65536 65536 80.33 8.21 87380 65536 65536 80.27 7.94 87380 65536 65536 80.32 6.54 87380 65536 65536 80.29 8.58 87380 65536 65536 80.35 7.37 87380 65536 65536 80.35 7.09 87380 65536 65536 80.37 7.23 87380 65536 65536 80.38 8.31 87380 65536 65536 80.38 8.18 87380 65536 65536 80.44 9.11 87380 65536 65536 80.43 4.95 87380 65536 65536 80.43 6.48 87380 65536 65536 80.42 8.11 87380 65536 65536 80.44 6.74 87380 65536 65536 80.47 8.76 87380 65536 65536 80.42 7.68 87380 65536 65536 80.45 6.10 87380 65536 65536 80.46 7.47 87380 65536 65536 80.51 7.37 87380 65536 65536 80.52 6.78 87380 65536 65536 80.48 7.31 87380 65536 65536 80.56 7.55 87380 65536 65536 80.57 6.85 87380 65536 65536 80.59 7.53 87380 65536 65536 80.63 7.01 87380 65536 65536 80.64 6.78 87380 65536 65536 80.60 5.76 87380 65536 65536 80.79 6.63 87380 65536 65536 80.79 6.29 87380 65536 65536 80.81 7.54 87380 65536 65536 80.81 7.22 87380 65536 65536 80.94 6.54 87380 65536 65536 80.90 8.02 87380 65536 65536 81.15 4.22 root@tg3:~# awk ''{sum+=$5} END {print sum,NR,sum/NR}'' vm-vm-50.dat 361.74 50 7.2348 Note the the terrible performance with single flows. With 50 flows, the aggregate improves, but is still much worse than the dom0 to dom0 results. Any ideas why I''m getting such bad performance from the VMs on high BDP links? I''m willing and interested to help in debugging and fixing this issue, but I need some leads :) TIA -- Diwaker Gupta http://resolute.ucsd.edu/diwaker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Any ideas why I''m getting such bad performance from the VMs > on high BDP links? I''m willing and interested to help in > debugging and fixing this issue, but I need some leads :)The first thing to do is to look at the CPU usage in dom0 and domU. If you can run them on different CPUs or even different hyperthreads it might make the experiment simpler to understand. The first thing to find out is whether you''re maxed out on CPU, or whether this is an IO blocking issue. Xm list should show you how much CPU each domain is burning. Secondly, enable performance counters in a Xen build, then use the user space tools to read out the context switch rate. How does it compare without the emulated BDP link? Also, you might want to play around with the rate limiting function in netback. If you set it to a few hundred Mb/s you might help promote batching. I''m also concerned that dummynet is pretty terible when operating at such high speeds, and the whole thing might be just a bad interaction between Xen''s batching and dummynet''s. Why not set up a real experiement across Abilene just to check? Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > Any ideas why I''m getting such bad performance from the VMs > > on high BDP links? I''m willing and interested to help in > > debugging and fixing this issue, but I need some leads :) > > The first thing to do is to look at the CPU usage in dom0 and domU. If > you can run them on different CPUs or even different hyperthreads it > might make the experiment simpler to understand. The first thing to find > out is whether you''re maxed out on CPU, or whether this is an IO > blocking issue. Xm list should show you how much CPU each domain is > burning.I had caught glimpses on the list of a top like utility for viewing CPU usage.. is that a reality yet? I haven''t followed up on that thread. The problem is that xm list is fine for very coarse grained measurements, but its a pain to do real-time fine granularity measurements with that. Sure, I could always write my own little Python script using the xm interface, but it''ll be great if we had something like top.> Also, you might want to play around with the rate limiting function in > netback. If you set it to a few hundred Mb/s you might help promote > batching.Sorry if this is dumb, but whats the rate limiting function in netback? Is it a run-time parameter or something in the code? What does it do? If I set it too high, won''t it lead to bad performance with low b/w flows? I guess I should just look at the code :)> I''m also concerned that dummynet is pretty terible when operating at > such high speeds, and the whole thing might be just a bad interaction > between Xen''s batching and dummynet''s. Why not set up a real experiement > across Abilene just to check?I think thats a separate debate. For now, I just want to get the same performance levels from a VM as from dom0, for all possible environments, dummynet just being one of them. Setting up a real experiment is a good idea though, I''m looking into it. BTW, where can I learn more on Xen''s "batching"? -- Diwaker Gupta http://resolute.ucsd.edu/diwaker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > Also, you might want to play around with the rate limiting > function in > > netback. If you set it to a few hundred Mb/s you might help promote > > batching. > > Sorry if this is dumb, but whats the rate limiting function > in netback? Is it a run-time parameter or something in the > code? What does it do? If I set it too high, won''t it lead to > bad performance with low b/w flows? I guess I should just > look at the code :)See: xm vif-limit Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Diwaker Gupta wrote:>>>Any ideas why I''m getting such bad performance from the VMs >>>on high BDP links? I''m willing and interested to help in >>>debugging and fixing this issue, but I need some leads :) >> >>The first thing to do is to look at the CPU usage in dom0 and domU. If >>you can run them on different CPUs or even different hyperthreads it >>might make the experiment simpler to understand. The first thing to find >>out is whether you''re maxed out on CPU, or whether this is an IO >>blocking issue. Xm list should show you how much CPU each domain is >>burning. > > > I had caught glimpses on the list of a top like utility for viewing > CPU usage.. is that a reality yet? I haven''t followed up on that > thread. The problem is that xm list is fine for very coarse grained > measurements, but its a pain to do real-time fine granularity > measurements with that. Sure, I could always write my own little > Python script using the xm interface, but it''ll be great if we had > something like top. > > >>Also, you might want to play around with the rate limiting function in >>netback. If you set it to a few hundred Mb/s you might help promote >>batching. > > > Sorry if this is dumb, but whats the rate limiting function in > netback? Is it a run-time parameter or something in the code? What > does it do? If I set it too high, won''t it lead to bad performance > with low b/w flows? I guess I should just look at the code :)Hi Diwaker! Sorry I''m coming to this thread late, I was out sick the last couple of days. I just started looking into the net flow control problem. Ian is speculating that the rate limiting function will actually help improve data get pushed faster. We''re looking into where exactly our latencies are. If you could run some debug patches for me, I''d really appreciate it.. Btw, have you tried using the -i and -I options to netperf? -i 30, 10, will at least ensure a minimum of 10 runs for each measurement, and -I can be used to specify a confidence interval (99, 5). Even if it''s consistent, I wouldn''t trust the 10 second run time for the test. Netperf uses setsockopt() to set its own buffer sizes, so increasing the system sysctl values will not affect your test in anyway (or shouldn''t ;)).>>I''m also concerned that dummynet is pretty terible when operating at >>such high speeds, and the whole thing might be just a bad interaction >>between Xen''s batching and dummynet''s. Why not set up a real experiement >>across Abilene just to check? > > > I think thats a separate debate. For now, I just want to get the same > performance levels from a VM as from dom0, for all possible > environments, dummynet just being one of them. Setting up a real > experiment is a good idea though, I''m looking into it. BTW, where can > I learn more on Xen''s "batching"?The question is how frequently should the frontend kick the backend, and how frequently should the backend pass along packets to the real device. Aggregating requests improves the efficiency of the transfers but impacts latency. thanks, Nivedita _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Apr 4, 2005 12:39 AM, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:> > > > Also, you might want to play around with the rate limiting > > function in > > > netback. If you set it to a few hundred Mb/s you might help promote > > > batching. > > > > Sorry if this is dumb, but whats the rate limiting function > > in netback? Is it a run-time parameter or something in the > > code? What does it do? If I set it too high, won''t it lead to > > bad performance with low b/w flows? I guess I should just > > look at the code :) > > See: xm vif-limit >Maybe I missed something. My xm only has vif-list, no vif-limit. I also grepped for anything resembling vif-limit inside the tools directory, but with no useful results. Is this a new feature? I''m using Xen 2.0.3 -- Diwaker Gupta http://resolute.ucsd.edu/diwaker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Hi Diwaker! Sorry I''m coming to this thread late, I was out > sick the last couple of days. I just started looking into the > net flow control problem. Ian is speculating that the rate > limiting function will actually help improve data get pushed > faster. We''re looking into where exactly our latencies are. > If you could run some debug patches for me, I''d really appreciate > it..I''d be happy to. Just send them over, and let me know if you find anything interesting.> Btw, have you tried using the -i and -I options to netperf? > -i 30, 10, will at least ensure a minimum of 10 runs for > each measurement, and -I can be used to specify a confidence > interval (99, 5). Even if it''s consistent, I wouldn''t trust the 10 > second run time for the test.I don''t trust the 10 second tests either, especially for such a high RTT. Thats why I ran the tests for 80 seconds (thats 1000 RTTs, and should give TCP enough time to stabilize). I''ll get some numbers using these options in any case.> Netperf uses setsockopt() to set its own buffer sizes, so > increasing the system sysctl values will not affect your test > in anyway (or shouldn''t ;)).Yeah, but in my experience it usually picks up the "default" value as set by the sysctl. I''ll check the code.> > > >>I''m also concerned that dummynet is pretty terible when operating at > >>such high speeds, and the whole thing might be just a bad interaction > >>between Xen''s batching and dummynet''s. Why not set up a real experiement > >>across Abilene just to check? > > > > > > I think thats a separate debate. For now, I just want to get the same > > performance levels from a VM as from dom0, for all possible > > environments, dummynet just being one of them. Setting up a real > > experiment is a good idea though, I''m looking into it. BTW, where can > > I learn more on Xen''s "batching"? > > The question is how frequently should the frontend kick the > backend, and how frequently should the backend pass along packets > to the real device. Aggregating requests improves the efficiency > of the transfers but impacts latency.I agree. But I think its a reasonable goal to expect dom0 performance to match a VM performance across a variety of environments :) -- Diwaker Gupta http://resolute.ucsd.edu/diwaker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> I don''t trust the 10 second tests either, especially for such a high > RTT. Thats why I ran the tests for 80 seconds (thats 1000 RTTs, and > should give TCP enough time to stabilize). I''ll get some numbers using > these options in any case.Cool :). Thanks for offering to test, too.> Yeah, but in my experience it usually picks up the "default" value as > set by the sysctl. I''ll check the code.In your netperf output, it''s listing the socket size as the default system 64K. If you invoke netperf with -s 131762 -S 131762 it should at least use 128K (local and remote). Bumping that up by 3 times usually gives good gain on netperf stream type tests bymmv..> I agree. But I think its a reasonable goal to expect dom0 performance > to match a VM performance across a variety of environments :)Yep ;) thanks, Nivedita _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Maybe I missed something. My xm only has vif-list, no > vif-limit. I also grepped for anything resembling vif-limit > inside the tools directory, but with no useful results. Is > this a new feature? I''m using Xen 2.0.3Please upgrade. Best, Ian _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > I don''t trust the 10 second tests either, especially for such a high > > RTT. Thats why I ran the tests for 80 seconds (thats 1000 RTTs, and > > should give TCP enough time to stabilize). I''ll get some numbers using > > these options in any case. > > Cool :). Thanks for offering to test, too.No problemo :) So I tried the -i and -I options... not too much of a difference. Slight improvement in the numbers, but the difference is still stark.> > Yeah, but in my experience it usually picks up the "default" value as > > set by the sysctl. I''ll check the code. > > In your netperf output, it''s listing the socket size as > the default system 64K. If you invoke netperf with -s 131762 -S 131762 > it should at least use 128K (local and remote). Bumping that > up by 3 times usually gives good gain on netperf stream type > tests bymmv..I looked at the netperf source. If the -s/-S values are not specified, it seems it sticks to the default values. Also, setsockopt only changes the maximum buffer size, the default is still governed by the sysctl values. Further, AFAIK, even the max value (in Linux) is just a hint to the TCP stack -- the actual size of the buffer is determined by the TCP auto-buffer tuning code. In any case, since both dom0 and the VM are using the same buffer sizes, I''m not too concerned about setting the "right" buffer sizes. Right now, I want to figure out the discrepancy in performance. -- Diwaker Gupta http://resolute.ucsd.edu/diwaker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Apr 4, 2005 11:48 AM, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote:> > Maybe I missed something. My xm only has vif-list, no > > vif-limit. I also grepped for anything resembling vif-limit > > inside the tools directory, but with no useful results. Is > > this a new feature? I''m using Xen 2.0.3 > > Please upgrade.I did: Xen version 2.0.5 (root@localdomain) (gcc version 3.3.5 (Debian 1:3.3.5-2)) Mon Apr 4 13:53:08 PDT 2005 But I still don''t see any xm vif-limit. Do I need to upgrade to unstable? -- Diwaker Gupta http://resolute.ucsd.edu/diwaker _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Diwaker Gupta wrote:> On Apr 4, 2005 11:48 AM, Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> wrote: > >>>Maybe I missed something. My xm only has vif-list, no >>>vif-limit. I also grepped for anything resembling vif-limit >>>inside the tools directory, but with no useful results. Is >>>this a new feature? I''m using Xen 2.0.3 >> >>Please upgrade. > > > I did: > Xen version 2.0.5 (root@localdomain) (gcc version 3.3.5 (Debian > 1:3.3.5-2)) Mon Apr 4 13:53:08 PDT 2005 > > But I still don''t see any xm vif-limit. Do I need to upgrade to unstable?Oops, sorry, I assumed you were on unstable. It would be very useful if you could try xen-unstable. Quite frankly, it has been relatively stable for me. thanks, Nivedita _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users