Markus Schuster
2010-Jan-10 01:08 UTC
[Xen-users] pv_ops kernel and network problems (checksum offloading?)
Hi list,
I''m experiencing some very strange network problems when using a
masquerading
router domU with pv_ops kernels.
First of all here is some ASCII art explaining my network configuration:
+---------------------+
+--|-eth0 domU2 eth1-|-----+
| +---------------------+ |
| |
| +---------------------+ |
| | domU1 eth1-|--+ |
| +---------------------+ | |
| | |
+-------|---------------------------|--|--------+
| | vif2.0 vif1.1 | | vif2.1 |
Internet | | | |
<-----|----- brexternal dom0 brinternal |
| eth0 |
+-----------------------------------------------+
domU1 intentionally has no internet connection and domU2 acts as masquerading
router for the internal network.
Configuration is very very basic, on domU2 I''ve issued the following
commands:
# echo 1 > /proc/sys/net/ipv4/ip_forward
# iptables -A POSTROUTING -t nat -s <internal/net> -j MASQUERADE
Now the problems:
1. ICMP
When I try to ping an internet host from domU1, dom0 kernel logs the following
message for every ICMP echo request packet domU1 tries to send:
--- cut ---
Attempting to checksum a non-TCP/UDP packet, dropping a protocol 1 packet
--- cut ---
IP protocol 1 is ICMP, so this matches. Using tcpdump I''ve been able to
follow
the ping packets their way: domU1-eth1 -> vif1.1 -> brinternal ->
vif2.1 ->
domU2-eth1 -> domU2-eth0
The packet never reaches vif2.0 - it gets dropped somewhere between (according
to the message I see, I would expect dom0 kernel to be the problem)
Issuing the same ping command directly on domU2 works without any problems.
2. TCP
When I try to connect to an internet host by TCP from domU1 I see a very very
odd behavior:
The TCP SYN packet leaves dom0 on eth0 as desired and reaches the remote host.
But the remote host never responds with a SYN/ACK packet, so I took a deeper
look with tcpdump and Wireshark: The packet *seems* to leave dom0 eth0 with
correct TCP checksum but enters the remote host with TCP checksum ALWAYS set
to 0xeeee - which is wrong of course, so the remote host drops the SYN packet.
But I''m very sure the packet leaves dom0 with wrong checksum.
Next I remembered the early XEN 3 days where we have been forced to use
ethtool to disable checksum offloading everywhere, so I did the same: I used
"ethtool -K <interface> tx off" for EVERY interface in the
communication path
(domU1-eth1, vif1.1, brinternal, vif2.1, domU2-eth1, domU2-eth0, vif2.0,
brexternal and dom0-eth0) but the only effect this gives is that now I see the
packet leaving dom0 at eth0 with a wrong checksum (0xeeee).
I have no problem connecting to this host directly from domU2.
My system configuration:
Debian lenny amd64 everywhere
XEN 3.4.2 (Debian unstable built for lenny)
dom0 kernel: pv_ops from Jeremies tree (changeset
8735edb4a976105fd29c97c00c6d14760537e4ee)
domU kernel: pv_ops 2.6.29-2 (from Debian unstable) (would like to go to newer
kernel, but there''s that other nasty bug :))
This looks like some sort of checksum offloading bug in pv_ops kernel tree
that kicks in when using a domU to route (and masquerade) other traffic.
Any ideas?
Regards,
Markus
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
James Harper
2010-Jan-10 02:13 UTC
RE: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
> Hi list, > > I''m experiencing some very strange network problems when using amasquerading> router domU with pv_ops kernels. > First of all here is some ASCII art explaining my networkconfiguration: There are no vlan''s on eth0 are there? Some chipsets have offload issues when combined with vlan, which xen doesn''t appear to handle well. I only mention this in case you simplified the diagram :) James _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-10 18:25 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Sunday 10 January 2010 03:13:58 James Harper wrote:> > Hi list, > > > > I''m experiencing some very strange network problems when using a > > masquerading > > > router domU with pv_ops kernels. > > First of all here is some ASCII art explaining my network > > configuration: > > There are no vlan''s on eth0 are there? Some chipsets have offload issues > when combined with vlan, which xen doesn''t appear to handle well. I only > mention this in case you simplified the diagram :)Hi James, thanks for your response. No, currently there are no VLANs involved, just plain Ethernet interfaces :) Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-19 10:26 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Sunday 10 January 2010 19:25:41 Markus Schuster wrote:> > > I''m experiencing some very strange network problems when using a > > > masquerading router domU with pv_ops kernels.Does anybody have some further ideas? Something to test to hunt this bug down? Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2010-Jan-20 01:54 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Tue, Jan 19, 2010 at 5:26 PM, Markus Schuster <ml@markus.schuster.name> wrote:> On Sunday 10 January 2010 19:25:41 Markus Schuster wrote: >> > > I''m experiencing some very strange network problems when using a >> > > masquerading router domU with pv_ops kernels. > > Does anybody have some further ideas? Something to test to hunt this bug down?I''d suggest you try RHEL/Centos5 on the same hardware to eliminate hardware/driver problems. The same setup works on my system. Other than that, you could probably try installing libvirt, which would create virbr0 by default, plus the necessary firewall rules. Then put domU1 on virbr0 bridge, and see if it can access the internet now. This would eliminate any bridge/iptables-related problem. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-20 22:11 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Wednesday 20 January 2010 02:54:24 Fajar A. Nugraha wrote:> On Tue, Jan 19, 2010 at 5:26 PM, Markus Schuster > > <ml@markus.schuster.name> wrote: > > On Sunday 10 January 2010 19:25:41 Markus Schuster wrote: > >> > > I''m experiencing some very strange network problems when using a > >> > > masquerading router domU with pv_ops kernels. > > > > Does anybody have some further ideas? Something to test to hunt this bug > > down? > > I''d suggest you try RHEL/Centos5 on the same hardware to eliminate > hardware/driver problems. The same setup works on my system.Hi Fajar, thanks for your response! If you say the same setup works on your system, what kernels have you running for dom0 and domU? I have pv_ops for both and if I remember correctly, these problems started as I switched over to pv_ops for dom0. But I could try the OpenSuse patches for 2.6.31 - if they work, the problem should lay somewhere in pv_ops netback or so... Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Fajar A. Nugraha
2010-Jan-20 22:34 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Thu, Jan 21, 2010 at 5:11 AM, Markus Schuster <ml@markus.schuster.name> wrote:> On Wednesday 20 January 2010 02:54:24 Fajar A. Nugraha wrote: >> I''d suggest you try RHEL/Centos5 on the same hardware to eliminate >> hardware/driver problems. The same setup works on my system. > > Hi Fajar, thanks for your response! > If you say the same setup works on your system, what kernels have you running > for dom0 and domU? I have pv_ops for both and if I remember correctly, these > problems started as I switched over to pv_ops for dom0.I should''ve said "same networking setup" :D> But I could try the OpenSuse patches for 2.6.31 - if they work, the problem > should lay somewhere in pv_ops netback or so...It probably is. I''m using stock RHEL 2.6.18 kernel-xen for both dom0 and domU. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Markus Schuster
2010-Jan-26 20:05 UTC
Re: [Xen-users] pv_ops kernel and network problems (checksumoffloading?)
On Wednesday 20 January 2010 23:34:24 Fajar A. Nugraha wrote:> > But I could try the OpenSuse patches for 2.6.31 - if they work, the > > problem should lay somewhere in pv_ops netback or so... > > It probably is. I''m using stock RHEL 2.6.18 kernel-xen for both dom0 and > domU.OK, I''ve compiled a 2.6.31.12 with the openSUSE ported patches and have no problems on the network side - so the problem should lay in pv_ops. Maybe I should update to latest GIT changeset and try again - currently I''m a few weeks old. At least that''s a problem I can reproduce in my testing environment, so I hope others should be able to do so, too :) If the problem persists with the latest GIT changeset, I will repost my problem to xen-devel - I think that fits better. Regards, Markus _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users