James Dykman
2006-Apr-13  22:29 UTC
[Xen-devel] [PATCH] Fix checksum errors when using network-bridge over VLANs
I set up a config similar to:
http://lists.xensource.com/archives/html/xen-users/2006-04/msg00164.html
and found that pings worked fine but TCP/UDP traffic would get checksum 
errors. A strategically placed dump_stack() shows
dev_queue_xmit() getting called twice for the same skb:
Apr 12 16:32:16 twofish kernel:  [<c0105081>] show_trace+0x21/0x30
Apr 12 16:32:16 twofish kernel:  [<c01051ee>] dump_stack+0x1e/0x20
Apr 12 16:32:16 twofish kernel:  [<c04545a5>] 
vlan_dev_hwaccel_hard_start_xmit+0x105/0x140
Apr 12 16:32:16 twofish kernel:  [<c03eee72>] dev_queue_xmit+0x192/0x350
<-----------------------------------
Apr 12 16:32:16 twofish kernel:  [<c043abac>] 
br_dev_queue_push_xmit+0x9c/0x140
Apr 12 16:32:16 twofish kernel:  [<c0440ed6>] 
br_nf_post_routing+0xf6/0x1c0
Apr 12 16:32:16 twofish kernel:  [<c03ffeee>] nf_iterate+0x5e/0x90
Apr 12 16:32:16 twofish kernel:  [<c03fff8d>] nf_hook_slow+0x6d/0x110
Apr 12 16:32:16 twofish kernel:  [<c043acaf>] br_forward_finish+0x5f/0x70
Apr 12 16:32:16 twofish kernel:  [<c044068e>] 
br_nf_forward_finish+0x6e/0x140
Apr 12 16:32:16 twofish kernel:  [<c0440847>] br_nf_forward_ip+0xe7/0x1a0
Apr 12 16:32:16 twofish kernel:  [<c03ffeee>] nf_iterate+0x5e/0x90
Apr 12 16:32:16 twofish kernel:  [<c03fff8d>] nf_hook_slow+0x6d/0x110
Apr 12 16:32:16 twofish kernel:  [<c043ada6>] __br_forward+0x76/0x80
Apr 12 16:32:16 twofish kernel:  [<c043ae4c>] br_forward+0x3c/0x60
Apr 12 16:32:16 twofish kernel:  [<c043bbc9>] 
br_handle_frame_finish+0xc9/0x160
Apr 12 16:32:16 twofish kernel:  [<c043f9c0>] 
br_nf_pre_routing_finish+0xf0/0x3a0
Apr 12 16:32:16 twofish kernel:  [<c044041b>] 
br_nf_pre_routing+0x3fb/0x580
Apr 12 16:32:16 twofish kernel:  [<c03ffeee>] nf_iterate+0x5e/0x90
Apr 12 16:32:16 twofish kernel:  [<c03fff8d>] nf_hook_slow+0x6d/0x110
Apr 12 16:32:16 twofish kernel:  [<c043be33>] br_handle_frame+0x1d3/0x240
Apr 12 16:32:16 twofish kernel:  [<c03ef492>] 
netif_receive_skb+0x152/0x280
Apr 12 16:32:16 twofish kernel:  [<c03ef65f>] process_backlog+0x9f/0x140
Apr 12 16:32:16 twofish kernel:  [<c03ef7da>] net_rx_action+0xda/0x150
Apr 12 16:32:16 twofish kernel:  [<c011e522>] __do_softirq+0x62/0xd0
Apr 12 16:32:16 twofish kernel:  [<c011e5d8>] do_softirq+0x48/0x60
Apr 12 16:32:16 twofish kernel:  [<c011e664>] local_bh_enable+0x74/0x80
Apr 12 16:32:16 twofish kernel:  [<c03eef95>] dev_queue_xmit+0x2b5/0x350
<-----------------------------------------
Apr 12 16:32:16 twofish kernel:  [<c0408d4f>] ip_output+0x13f/0x2c0
Apr 12 16:32:16 twofish kernel:  [<c040b0ca>] 
ip_push_pending_frames+0x3fa/0x4c0
Apr 12 16:32:16 twofish kernel:  [<c042473d>] raw_sendmsg+0x48d/0x4f0
Apr 12 16:32:16 twofish kernel:  [<c042d1e6>] inet_sendmsg+0x46/0x50
Apr 12 16:32:16 twofish kernel:  [<c03e47db>] sock_sendmsg+0xbb/0xf0
Apr 12 16:32:16 twofish kernel:  [<c03e61d1>] sys_sendmsg+0x1b1/0x250
Apr 12 16:32:16 twofish kernel:  [<c03e64f7>] sys_socketcall+0x87/0x240
Apr 12 16:32:16 twofish kernel:  [<c0104be9>] syscall_call+0x7/0xb
Since we don''t reset the proto_csum_blank flag in the skb, the checksum
calculation gets done twice, which 
is not twice as good as once.
With this patch, TCP/UDP checksum errors from dom0 are fixed, and domUs 
can use TCP/UDP without turning off TX checksum offload.
Normal non-VLAN bridged configs still work fine, tested with xm-test. 
Jim
Signed-off-by: Jim Dykman <dykman@us.ibm.com>
diff -r 4ed269ac7d84 linux-2.6-xen-sparse/net/core/dev.c
--- a/linux-2.6-xen-sparse/net/core/dev.c       Mon Apr 10 12:24:58 2006
+++ b/linux-2.6-xen-sparse/net/core/dev.c       Thu Apr 13 17:30:45 2006
@@ -1294,6 +1294,7 @@
                if ((skb->h.raw + skb->csum + 2) > skb->tail)
                        goto out_kfree_skb;
                skb->ip_summed = CHECKSUM_HW;
+               skb->proto_csum_blank = 0;
        }
 #endif
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Jean-Francois Stenuit
2006-May-05  08:21 UTC
Re: [Xen-devel] [PATCH] Fix checksum errors when using network-bridge over VLANs
On Thu, 13 Apr 2006 18:29:51 -0400, James Dykman wrote:> I set up a config similar to: > > http://lists.xensource.com/archives/html/xen-users/2006-04/msg00164.html > > and found that pings worked fine but TCP/UDP traffic would get checksum > errors.Just noticed that. Pretty hard to troubleshoot.> With this patch, TCP/UDP checksum errors from dom0 are fixed, and domUs > can use TCP/UDP without turning off TX checksum offload. > Normal non-VLAN bridged configs still work fine, tested with xm-test.Just a quick question : is there a work-around available without a full recompile of Xen kernels ? -- |--- Jean-Francois "Jef" Stenuit _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Dykman
2006-May-05  14:28 UTC
Re: [Xen-devel] [PATCH] Fix checksum errors when using network-bridge over VLANs
Jean-Francois Stenuit <jfs@skynet.be> wrote on 05/05/2006 04:21:28 AM:> > On Thu, 13 Apr 2006 18:29:51 -0400, James Dykman wrote: > > > > With this patch, TCP/UDP checksum errors from dom0 are fixed, anddomUs> > can use TCP/UDP without turning off TX checksum offload. > > Normal non-VLAN bridged configs still work fine, tested with xm-test. > > Just a quick question : is there a work-around available without a full > recompile of Xen kernels ? >You can use ethtool to turn off the TX checksum offload, so that the TCP/UDP protocols calculate them. In the domUs, and possibly even dom0 (I haven''t tried it myself): ethtool -K <ethX> tx off Jim _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel