Dr. Volker Jaenisch
2008-Sep-22  13:09 UTC
[Xen-devel] Good unidirectional TCP performance, weird asymetric performance going bidirectional
Hi Xen-Developers! After several days of tuning and testing I stuck with a serious performance degration doing bidirectional tcp networking between dom0 to domU as well as domU to domU. I send this posting to xen-users and was encouraged there to send to xen-devel. Overview: ======dom0 to domU: ------------- unidirectional : dom0 -> domU : 578 Mbits/sec dom0 <- domU : 1.22 Gbits/sec Cool isnt it? bidirectional: dom0 <=> domU: dom0 -> domU : 1.22 Gbits/sec dom0 <- domU : 38.2 Mbits/sec Ups! But things it can become worse... domU1 to domU2: --------------- unidirectional: domU1 -> domU2 : 410,2 Mbits/sec domU1 <- domU2 : 378.1 Mbits/sec Can easily live with that. bidirectional: domU1 <=> domU2 : domU1 -> domU2 : 42,3 Mbits/sec domU1 <- domU2 : 38.2 Mbits/sec But what is that? Some problems looking similiar to that have been discussed in postings in xen-users list and elsewhere. I have read lots of them. But none of the mentioned solutions (TCP-tuning, ethtool tweaking, scheduler tuning, etc) have helped to get rid of this behavior. May be I missed something. This behavior is reproducable with xen 3.1, xen 3.2 on two different machines. * May anybody confirm this findings? * Anybody an idea? Any help appreciated. Best Regards, Volker In the following the details ... iperf yields the following reproductable numbers.>From dom0 to domU:=============zeus:/etc/xen# iperf -c 192.168.2.22 -p5555 ------------------------------------------------------------ Client connecting to 192.168.2.22, TCP port 5555 TCP window size: 27.2 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.20 port 59804 connected with 192.168.2.22 port 5555 [ 3] 0.0-10.0 sec 689 MBytes 578 Mbits/sec>From domU to dom0:=============apollo:~# iperf -c 192.168.2.20 -p6666 -t60 ------------------------------------------------------------ Client connecting to 192.168.2.20, TCP port 6666 TCP window size: 23.3 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.22 port 36007 connected with 192.168.2.20 port 6666 [ 3] 0.0-60.0 sec 8.55 GBytes 1.22 Gbits/sec So far nothing to complain about. But domU to dom0 full duplex (both directions simultaneous) ================apollo:~# iperf -c 192.168.2.20 -p6666 -t60 -d ------------------------------------------------------------ Client connecting to 192.168.2.20, TCP port 6666 TCP window size: 23.3 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.22 port 36007 connected with 192.168.2.20 port 6666 [ 4] local 192.168.2.22 port 5555 connected with 192.168.2.20 port 34223 [ 3] 0.0-60.0 sec 8.55 GBytes 1.22 Gbits/sec [ 4] 0.0-60.0 sec 274 MBytes 38.2 Mbits/sec This is weird. The domU to dom0 performance is still there. But the dom0 to domU performance drops to less than 10%. Same holds for domU to domU communication: About 400MBit/sec in every one direction and 40Mbit/sec bidirectionally. This behavior is independent of the client server role of iperf. Same happens if I start two simultaneous seperate iperf runs in opposite direction by hand. This eliminates iperf as source of the problem. The test-setup is as follows: * AMD64 Opteron Dual Core server. * ASUS Server-Mainboard. * 4GB RAM. * Running debian etch. Debian Kernel 2.6.18-6-xen-amd64 for dom0 and domUs. * xen3.2 hypervisor from debian package. (Same behavior with xen3.1) * Dom0(zeus) 2 GB RAM, tagged on CPU 0 * DomU(apollo) 2 GB RAM, tagged on CPU 1 * Network connected via bridge br0. bridge name bridge id STP enabled interfaces br0 8000.001bfcdbd279 no eth0 vif9.0 I see dropped packets in the vif2:0 interface. But only if packages go from the domU to the dom0. In case dom0 to domU no dropped packages. zeus:~# ifconfig br0 Link encap:Ethernet HWaddr 00:1B:FC:DB:D2:79 inet addr:192.168.2.20 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21b:fcff:fedb:d279/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7705950 errors:0 dropped:0 overruns:0 frame:0 TX packets:1406199 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:11028796571 (10.2 GiB) TX bytes:1086554767 (1.0 GiB) eth0 Link encap:Ethernet HWaddr 00:1B:FC:DB:D2:79 inet6 addr: fe80::21b:fcff:fedb:d279/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6776 errors:0 dropped:0 overruns:0 frame:0 TX packets:3001 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1442559 (1.3 MiB) TX bytes:339793 (331.8 KiB) Interrupt:23 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:34 errors:0 dropped:0 overruns:0 frame:0 TX packets:34 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2576 (2.5 KiB) TX bytes:2576 (2.5 KiB) vif2.0 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7699163 errors:0 dropped:0 overruns:0 frame:0 TX packets:1401631 errors:0 dropped:4941 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:11027474868 (10.2 GiB) TX bytes:1082823070 (1.0 GiB) If have used ethtool -K eth tx off applied on the domUs interfaces to prevent TCP checksum errors. Checked with tcpdump for errors. No obvisouly errors in the stream. A few out of order packages but nothing serious. One fact is remarkable: The CPU utilisation is lower on the domU side and strikingly low for the domU to domU case. dom0 to domU case: dom0 domU 96% 80% domU to domU case: domU domU 15% 15% (Data from xentop) I''m not sure * if this low cpu utilisation is due to the low network performance OR * if something limits the CPU bandwith in the domU and so causes the degraded network performance?? All Machines (dom0, and domUs) have exactly the same TCP tuning parameters. zeus:/etc/xen# sysctl -a | grep net.core net.core.netdev_budget = 300 net.core.somaxconn = 128 net.core.xfrm_aevent_rseqth = 2 net.core.xfrm_aevent_etime = 10 net.core.optmem_max = 20480 net.core.message_burst = 10 net.core.message_cost = 5 net.core.netdev_max_backlog = 1000 net.core.dev_weight = 64 net.core.rmem_default = 126976 net.core.wmem_default = 126976 net.core.rmem_max = 131071 net.core.wmem_max = 131071 zeus:/etc/xen# sysctl -a | grep net.ipv4.tcp net.ipv4.tcp_slow_start_after_idle = 1 net.ipv4.tcp_dma_copybreak = 4096 net.ipv4.tcp_workaround_signed_windows = 0 net.ipv4.tcp_base_mss = 512 net.ipv4.tcp_mtu_probing = 0 net.ipv4.tcp_abc = 0 net.ipv4.tcp_congestion_control = bic net.ipv4.tcp_tso_win_divisor = 3 net.ipv4.tcp_moderate_rcvbuf = 1 net.ipv4.tcp_no_metrics_save = 0 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_frto = 0 net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_adv_win_scale = 2 net.ipv4.tcp_app_win = 31 net.ipv4.tcp_rmem = 4096 87380 4194304 net.ipv4.tcp_wmem = 4096 16384 4194304 net.ipv4.tcp_mem = 196608 262144 3932160 net.ipv4.tcp_dsack = 1 net.ipv4.tcp_ecn = 0 net.ipv4.tcp_reordering = 3 net.ipv4.tcp_fack = 1 net.ipv4.tcp_orphan_retries = 0 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_rfc1337 = 0 net.ipv4.tcp_stdurg = 0 net.ipv4.tcp_abort_on_overflow = 0 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_fin_timeout = 60 net.ipv4.tcp_retries2 = 15 net.ipv4.tcp_retries1 = 3 net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 7200 net.ipv4.tcp_max_tw_buckets = 180000 net.ipv4.tcp_max_orphans = 65536 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_syn_retries = 5 net.ipv4.tcp_retrans_collapse = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1 TCP-Tuning has no influence on this behavior. Send-, and Receive-Buffer-Sizes, Backlog length, txqueuelength etc. does not shift this behavior a bit. The bottleneck is nor the machine nor the bridge-setup. The same machine using the same bridge setup runs that fast in bidirectional tests ------------------------------------------------------------ Client connecting to 192.168.2.202, TCP port 5555 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 5] local 192.168.2.200 port 50169 connected with 192.168.2.202 port 5555 [ 4] local 192.168.2.200 port 5555 connected with 192.168.2.202 port 53744 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 830 MBytes 696 Mbits/sec [ 4] 0.0-10.0 sec 1.02 GBytes 878 Mbits/sec with two openVZ containers having real veth interfaces. We tested openVZ for quality zope application hosting to get rid of the kernel RAM overhead of XEN. We expected the network of openVZ to be weak and half-baked. Please don''t get me wrong : this is no XEN bashing attempt. We have XEN in production since years and like to have it as long as it stays open source and is avaible in recent kernels :-). We simply like to understand what our xen machines do. Best Regards Volker -- =================================================== inqbus it-consulting +49 ( 341 ) 5643800 Dr. Volker Jaenisch http://www.inqbus.de Herloßsohnstr. 12 0 4 1 5 5 Leipzig N O T - F Ä L L E +49 ( 170 ) 3113748 =================================================== _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel