I have a machine running a couple of domUs in a private (i.e. not bridged with any real ethernet devices) virtual network where the host machine is a router between the domUs and the physical network. I have been observing, via tcpdump, the following behaviour: 14:36:26.777823 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6992256:6996300(4044) ack 1 win 362 <nop,nop,timestamp 58407492 129250939> 14:36:26.777922 IP 192.168.0.254 > 192.168.0.1: ICMP 10.75.22.151 unreachable - need to frag (mtu 1500), length 556 14:36:26.977726 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6992256:6993604(1348) ack 1 win 362 <nop,nop,timestamp 58407543 129250939> 14:36:26.978635 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 6993604 win 499 <nop,nop,timestamp 129251018 58407543> 14:36:26.979418 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6993604:6994952(1348) ack 1 win 362 <nop,nop,timestamp 58407543 129251018> 14:36:26.979420 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6994952:6996300(1348) ack 1 win 362 <nop,nop,timestamp 58407543 129251018> 14:36:26.980336 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 6994952 win 499 <nop,nop,timestamp 129251019 58407543> 14:36:26.980411 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 6996300 win 488 <nop,nop,timestamp 129251019 58407543> 14:36:26.981657 IP 192.168.0.1.988 > 10.75.22.151.1022: P 6996300:6996448(148) ack 1 win 362 <nop,nop,timestamp 58407543 129251019> 14:36:26.982100 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 6996448 win 502 <nop,nop,timestamp 129251019 58407543> 14:36:27.018003 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6996448:7000492(4044) ack 1 win 362 <nop,nop,timestamp 58407553 129251019> 14:36:27.018068 IP 192.168.0.254 > 192.168.0.1: ICMP 10.75.22.151 unreachable - need to frag (mtu 1500), length 556 14:36:27.221735 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6996448:6997796(1348) ack 1 win 362 <nop,nop,timestamp 58407604 129251019> 14:36:27.222682 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 6997796 win 499 <nop,nop,timestamp 129251079 58407604> 14:36:27.223523 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6997796:6999144(1348) ack 1 win 362 <nop,nop,timestamp 58407604 129251079> 14:36:27.223524 IP 192.168.0.1.988 > 10.75.22.151.1022: . 6999144:7000492(1348) ack 1 win 362 <nop,nop,timestamp 58407604 129251079> 14:36:27.224438 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 6999144 win 499 <nop,nop,timestamp 129251080 58407604> 14:36:27.224506 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 7000492 win 488 <nop,nop,timestamp 129251080 58407604> 14:36:27.225682 IP 192.168.0.1.988 > 10.75.22.151.1022: P 7000492:7000640(148) ack 1 win 362 <nop,nop,timestamp 58407604 129251080> 14:36:27.226097 IP 10.75.22.151.1022 > 192.168.0.1.988: . ack 7000640 win 502 <nop,nop,timestamp 129251080 58407604> As you can see, for whatever reason, the TCP session on that port pair is trying to send 4044 byte TCP frames, much in excess of the MTU of the downstream network (hence the ICMP need-fragmentation messages), but more importantly in excess of the MTU of the virtual interface on the domU: eth0 Link encap:Ethernet HWaddr 00:16:3E:73:F6:BE inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::216:3eff:fe73:f6be/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3567098 errors:0 dropped:0 overruns:0 frame:0 TX packets:3471796 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1508328213 (1.4 GiB) TX bytes:2016647487 (1.8 GiB) Is this a bug or intended behaviour? If intended, can I prevent it somehow? Cheers, b. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> > I have a machine running a couple of domUs in a private (i.e. notbridged> with any real ethernet devices) virtual network where the host machineis> a router between the domUs and the physical network. > > I have been observing, via tcpdump, the following behaviour: > > 14:36:26.777823 IP 192.168.0.1.988 > 10.75.22.151.1022: . > 6992256:6996300(4044) ack 1 win 362 <nop,nop,timestamp 58407492129250939>> 14:36:26.777922 IP 192.168.0.254 > 192.168.0.1: ICMP 10.75.22.151 > unreachable - need to frag (mtu 1500), length 556 > > Is this a bug or intended behaviour? If intended, can I prevent it > somehow? >It''s intended behaviour to send packets that big, but it''s a bug that it doesn''t work for you. The idea is that DomU sends big packets, and the hardware adapter splits them up into MTU sized packets. What is your DomU? If it''s windows running GPLPV, you can turn off the large send offload in the adapter properties. If it''s linux (pv or hvm) you can use ethtool to disable the large send offload, eg ethtool -K eth0 tso off James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Brian J. Murrell
2009-Mar-03 15:01 UTC
[Xen-users] Re: domU sending frames larger than MTU
On Tue, 03 Mar 2009 09:56:41 +1100, James Harper wrote:> > It''s intended behaviour to send packets that big, but it''s a bug that it > doesn''t work for you.Well, it doesn''t exactly not work. IP does what it''s supposed to and the Dom0 (which is 192.168.0.254 on the br0 that is bridged with the DomU "private" net from the packet trace in my previous posting) sends the icmp "needs frag" to the DomU and then the DomU resends the 4044 bytes chopped up into more "ethernet friendly" sized packets, which the Dom0 put out onto the ethernet.> The idea is that DomU sends big packets, and the > hardware adapter splits them up into MTU sized packets.You know I was very sceptical of this until I did some googling given then "tso" ethtool option you pointed out below, which led me to http:// en.wikipedia.org/wiki/Large_segment_offload. TBH, I had not heard of this before. Interesting that they are building that kind of intelligence into NICs these days. So the theory might be then that the physical NIC in the Dom0 doesn''t support this offloading and so the IP stack in the Dom0 has no choice but to ask the sender to fragment the packets itself. Looking at it with ethtool, this is what the NIC in the Dom0 reports: Dom0# ethtool -k eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported rx-checksumming: off tx-checksumming: off scatter-gather: off tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off Maybe I just need to enable it. # ethtool -K eth0 tso on Cannot set device tcp segmentation offload settings: Operation not supported So perhaps this NIC (Broadcom Corporation BCM4401 100Base-T) doesn''t have TCP offload.> What is your DomU?Linux.> If it''s linux (pv or hvm) you can use ethtool to disable the large send > offload, eg ethtool -K eth0 tso offNext time I have those domUs up, perhaps I will give that a try. That ethtool command is on the DomU, right? So it''s fiddling with the virtual ethernet port, yes? Cheers, b. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users