Markus Schuster
2010-Jan-10 01:08 UTC
[Xen-users] pv_ops kernel and network problems (checksum offloading?)
Hi list, I''m experiencing some very strange network problems when using a masquerading router domU with pv_ops kernels. First of all here is some ASCII art explaining my network configuration: +---------------------+ +--|-eth0 domU2 eth1-|-----+ | +---------------------+ | | | | +---------------------+ | | | domU1 eth1-|--+ | | +---------------------+ | | | | | +-------|---------------------------|--|--------+ | | vif2.0 vif1.1 | | vif2.1 | Internet | | | | <-----|----- brexternal dom0 brinternal | | eth0 | +-----------------------------------------------+ domU1 intentionally has no internet connection and domU2 acts as masquerading router for the internal network. Configuration is very very basic, on domU2 I''ve issued the following commands: # echo 1 > /proc/sys/net/ipv4/ip_forward # iptables -A POSTROUTING -t nat -s <internal/net> -j MASQUERADE Now the problems: 1. ICMP When I try to ping an internet host from domU1, dom0 kernel logs the following message for every ICMP echo request packet domU1 tries to send: --- cut --- Attempting to checksum a non-TCP/UDP packet, dropping a protocol 1 packet --- cut --- IP protocol 1 is ICMP, so this matches. Using tcpdump I''ve been able to follow the ping packets their way: domU1-eth1 -> vif1.1 -> brinternal -> vif2.1 -> domU2-eth1 -> domU2-eth0 The packet never reaches vif2.0 - it gets dropped somewhere between (according to the message I see, I would expect dom0 kernel to be the problem) Issuing the same ping command directly on domU2 works without any problems. 2. TCP When I try to connect to an internet host by TCP from domU1 I see a very very odd behavior: The TCP SYN packet leaves dom0 on eth0 as desired and reaches the remote host. But the remote host never responds with a SYN/ACK packet, so I took a deeper look with tcpdump and Wireshark: The packet *seems* to leave dom0 eth0 with correct TCP checksum but enters the remote host with TCP checksum ALWAYS set to 0xeeee - which is wrong of course, so the remote host drops the SYN packet. But I''m very sure the packet leaves dom0 with wrong checksum. Next I remembered the early XEN 3 days where we have been forced to use ethtool to disable checksum offloading everywhere, so I did the same: I used "ethtool -K <interface> tx off" for EVERY interface in the communication path (domU1-eth1, vif1.1, brinternal, vif2.1, domU2-eth1, domU2-eth0, vif2.0, brexternal and dom0-eth0) but the only effect this gives is that now I see the packet leaving dom0 at eth0 with a wrong checksum (0xeeee). I have no problem connecting to this host directly from domU2. My system configuration: Debian lenny amd64 everywhere XEN 3.4.2 (Debian unstable built for lenny) dom0 kernel: pv_ops from Jeremies tree (changeset 8735edb4a976105fd29c97c00c6d14760537e4ee) domU kernel: pv_ops 2.6.29-2 (from Debian unstable) (would like to go to newer kernel, but there''s that other nasty bug :)) This looks like some sort of checksum offloading bug in pv_ops kernel tree that kicks in when using a domU to route (and masquerade) other traffic. Any ideas? Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
James Harper
2010-Jan-10 02:13 UTC
RE: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
> Hi list, > > I''m experiencing some very strange network problems when using amasquerading> router domU with pv_ops kernels. > First of all here is some ASCII art explaining my networkconfiguration: There are no vlan''s on eth0 are there? Some chipsets have offload issues when combined with vlan, which xen doesn''t appear to handle well. I only mention this in case you simplified the diagram :) James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-10 18:25 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Sunday 10 January 2010 03:13:58 James Harper wrote:> > Hi list, > > > > I''m experiencing some very strange network problems when using a > > masquerading > > > router domU with pv_ops kernels. > > First of all here is some ASCII art explaining my network > > configuration: > > There are no vlan''s on eth0 are there? Some chipsets have offload issues > when combined with vlan, which xen doesn''t appear to handle well. I only > mention this in case you simplified the diagram :)Hi James, thanks for your response. No, currently there are no VLANs involved, just plain Ethernet interfaces :) Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-19 10:26 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Sunday 10 January 2010 19:25:41 Markus Schuster wrote:> > > I''m experiencing some very strange network problems when using a > > > masquerading router domU with pv_ops kernels.Does anybody have some further ideas? Something to test to hunt this bug down? Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2010-Jan-20 01:54 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Tue, Jan 19, 2010 at 5:26 PM, Markus Schuster <ml@markus.schuster.name> wrote:> On Sunday 10 January 2010 19:25:41 Markus Schuster wrote: >> > > I''m experiencing some very strange network problems when using a >> > > masquerading router domU with pv_ops kernels. > > Does anybody have some further ideas? Something to test to hunt this bug down?I''d suggest you try RHEL/Centos5 on the same hardware to eliminate hardware/driver problems. The same setup works on my system. Other than that, you could probably try installing libvirt, which would create virbr0 by default, plus the necessary firewall rules. Then put domU1 on virbr0 bridge, and see if it can access the internet now. This would eliminate any bridge/iptables-related problem. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-20 22:11 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Wednesday 20 January 2010 02:54:24 Fajar A. Nugraha wrote:> On Tue, Jan 19, 2010 at 5:26 PM, Markus Schuster > > <ml@markus.schuster.name> wrote: > > On Sunday 10 January 2010 19:25:41 Markus Schuster wrote: > >> > > I''m experiencing some very strange network problems when using a > >> > > masquerading router domU with pv_ops kernels. > > > > Does anybody have some further ideas? Something to test to hunt this bug > > down? > > I''d suggest you try RHEL/Centos5 on the same hardware to eliminate > hardware/driver problems. The same setup works on my system.Hi Fajar, thanks for your response! If you say the same setup works on your system, what kernels have you running for dom0 and domU? I have pv_ops for both and if I remember correctly, these problems started as I switched over to pv_ops for dom0. But I could try the OpenSuse patches for 2.6.31 - if they work, the problem should lay somewhere in pv_ops netback or so... Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2010-Jan-20 22:34 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Thu, Jan 21, 2010 at 5:11 AM, Markus Schuster <ml@markus.schuster.name> wrote:> On Wednesday 20 January 2010 02:54:24 Fajar A. Nugraha wrote: >> I''d suggest you try RHEL/Centos5 on the same hardware to eliminate >> hardware/driver problems. The same setup works on my system. > > Hi Fajar, thanks for your response! > If you say the same setup works on your system, what kernels have you running > for dom0 and domU? I have pv_ops for both and if I remember correctly, these > problems started as I switched over to pv_ops for dom0.I should''ve said "same networking setup" :D> But I could try the OpenSuse patches for 2.6.31 - if they work, the problem > should lay somewhere in pv_ops netback or so...It probably is. I''m using stock RHEL 2.6.18 kernel-xen for both dom0 and domU. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-26 20:05 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Wednesday 20 January 2010 23:34:24 Fajar A. Nugraha wrote:> > But I could try the OpenSuse patches for 2.6.31 - if they work, the > > problem should lay somewhere in pv_ops netback or so... > > It probably is. I''m using stock RHEL 2.6.18 kernel-xen for both dom0 and > domU.OK, I''ve compiled a 2.6.31.12 with the openSUSE ported patches and have no problems on the network side - so the problem should lay in pv_ops. Maybe I should update to latest GIT changeset and try again - currently I''m a few weeks old. At least that''s a problem I can reproduce in my testing environment, so I hope others should be able to do so, too :) If the problem persists with the latest GIT changeset, I will repost my problem to xen-devel - I think that fits better. Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users