Sylvain Munaut
2012-Sep-03 15:22 UTC
Failure in Xen network PV driver when acceleration enabled
Hi, I''m experiencing a failure in DomU networking, seeing messages such as "net eth0: rx->offset: 0, size: 4294967295" in the DomU. The setup is: - Dom0 running recent 3.6-rc3 - DomU 1 running a Ceph cluster - DomU 2 running of a block device hosted by that ceph cluster. So basically when DomU 2 makes a block device request, the path is (I think) : - DomU 2 Frontend block driver - Dom0 Back driver - Dom0 RBD Ceph block driver - Dom0 Kernel TCP connection to DomU 1 - Dom0 DomU 1 Backend network driver - DomU 1 Frontend network driver And I''m seeing those "net eth0: rx->offset: 0, size: 4294967295" messages in DomU1 dmesg. DomU2 doesn''t finish booting at all. From what I can see in the Ceph logs, it seems that the DomU 1 receives corrupted messages. I''ve been digging a bit and it seems issuing a ethtool -K vif1.0 tx off in the dom0 prevents the issue. (vif1.0 being the DomU1 virtual network interface) Note that it needs to be in the dom0 on the VIF and not in the domU on the eth0 interface like I originally tried. It''s also not enough to disable gso and/or tso, you need to turn off all tx accel. I originally reported this in xen-user and ceph-devel but now that the failure has been narrowed to a Xen PV net bug seems more appropriate. Thread archive available at http://lists.xen.org/archives/html/xen-users/2012-08/msg00321.html Cheers, Sylvain