thr3ads.net - Xen devel - [Xen-devel] [0/5] [NET]: Add TSO support [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Herbert Xu

2006-Jun-27 12:02 UTC

[Xen-devel] [0/5] [NET]: Add TSO support

Hi:

The following patches add TCP Segmentation Offload (TSO) support in the
domU => dom0 direction.  If everyone''s happy with this approach then
it''s
trivial to do the same thing for the opposite direction.

TSO support requires the Geneic Segmentation Offload (GSO) infrastructure
that was recently added to Linux (merged after the release of 2.6.17 so
will be part of 2.6.18).  This is needed when a TSO packet is sent to a
device without TSO support.

I''ve included a backport of GSO for 2.6.16.13 here.

Testing is should be easy as TSO is turned on by default.  You can verify
that it''s working by attaching tcpdump to any of the interfaces
involved
ina domU => dom0 transfer.  You should see packets whose payload is an
integral multiple of MSS.

Comparison with base line and jumbo MTU:

baseline:	1228.08Mb/s
mtu=16436:	3097.49Mb/s
TSO:		3208.41Mb/s
mtu=60040:	3869.36Mb/s

lo(16436):	5543.91Mb/s
lo(60040):	8544.08Mb/s

There is still a problem with TSO ack timing that causes it to generate
smaller than optimal packets.  This can be seen with increasing the RCV
buffer size on the transfer which actually causes TSO performance to
dip by 30% to about 2000Mb/s.  The reason is that we get a lot more
smaller packets (2*MSS or 3*MSS) than really big ones (~64K).  I''m
investigating this issue.

However, even with this nit, it''s still a great improvement over the
base
line and does not have the interoperability issue that comes with raising
MTU.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-27 12:05 UTC

head link

[Xen-devel] [1/5] [NET]: Added GSO support

Hi:

[NET]: Added GSO support

Imported GSO patch.  This is a 2.6.16 backport for the GSO patch that was
merged in Linux.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 9ec0b4f10b4f -r b255efd0df72 linux-2.6-xen-sparse/include/linux/skbuff.h
--- a/linux-2.6-xen-sparse/include/linux/skbuff.h	Sat Jun 24 23:44:18 2006 +0100
+++ b/linux-2.6-xen-sparse/include/linux/skbuff.h	Tue Jun 27 21:27:13 2006 +1000
@@ -134,9 +134,10 @@ struct skb_shared_info {
 struct skb_shared_info {
 	atomic_t	dataref;
 	unsigned short	nr_frags;
-	unsigned short	tso_size;
-	unsigned short	tso_segs;
-	unsigned short  ufo_size;
+	unsigned short	gso_size;
+	/* Warning: this field is not always filled in (UFO)! */
+	unsigned short	gso_segs;
+	unsigned short  gso_type;
 	unsigned int    ip6_frag_id;
 	struct sk_buff	*frag_list;
 	skb_frag_t	frags[MAX_SKB_FRAGS];
@@ -166,6 +167,11 @@ enum {
 	SKB_FCLONE_UNAVAILABLE,
 	SKB_FCLONE_ORIG,
 	SKB_FCLONE_CLONE,
+};
+
+enum {
+	SKB_GSO_TCPV4 = 1 << 0,
+	SKB_GSO_UDPV4 = 1 << 1,
 };
 
 /** 
@@ -1157,18 +1163,34 @@ static inline int skb_can_coalesce(struc
 	return 0;
 }
 
+static inline int __skb_linearize(struct sk_buff *skb)
+{
+	return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM;
+}
+
 /**
  *	skb_linearize - convert paged skb to linear one
  *	@skb: buffer to linarize
- *	@gfp: allocation mode
  *
  *	If there is no free memory -ENOMEM is returned, otherwise zero
  *	is returned and the old skb data released.
  */
-extern int __skb_linearize(struct sk_buff *skb, gfp_t gfp);
-static inline int skb_linearize(struct sk_buff *skb, gfp_t gfp)
-{
-	return __skb_linearize(skb, gfp);
+static inline int skb_linearize(struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) ? __skb_linearize(skb) : 0;
+}
+
+/**
+ *	skb_linearize_cow - make sure skb is linear and writable
+ *	@skb: buffer to process
+ *
+ *	If there is no free memory -ENOMEM is returned, otherwise zero
+ *	is returned and the old skb data released.
+ */
+static inline int skb_linearize_cow(struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) || skb_cloned(skb) ?
+	       __skb_linearize(skb) : 0;
 }
 
 /**
@@ -1263,6 +1285,7 @@ extern void	       skb_split(struct sk_b
 				 struct sk_buff *skb1, const u32 len);
 
 extern void	       skb_release_data(struct sk_buff *skb);
+extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg);
 
 static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
 				       int len, void *buffer)
diff -r 9ec0b4f10b4f -r b255efd0df72 linux-2.6-xen-sparse/net/core/dev.c
--- a/linux-2.6-xen-sparse/net/core/dev.c	Sat Jun 24 23:44:18 2006 +0100
+++ b/linux-2.6-xen-sparse/net/core/dev.c	Tue Jun 27 21:27:13 2006 +1000
@@ -115,6 +115,7 @@
 #include <net/iw_handler.h>
 #endif	/* CONFIG_NET_RADIO */
 #include <asm/current.h>
+#include <linux/err.h>
 
 #ifdef CONFIG_XEN
 #include <net/ip.h>
@@ -1038,7 +1039,7 @@ static inline void net_timestamp(struct 
  *	taps currently in use.
  */
 
-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct packet_type *ptype;
 
@@ -1112,6 +1113,40 @@ out:
 	return ret;
 }
 
+/**
+ *	skb_gso_segment - Perform segmentation on skb.
+ *	@skb: buffer to segment
+ *	@sg: whether scatter-gather is supported on the target.
+ *
+ *	This function segments the given skb and returns a list of segments.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg)
+{
+	struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
+	struct packet_type *ptype;
+	int type = skb->protocol;
+
+	BUG_ON(skb_shinfo(skb)->frag_list);
+	BUG_ON(skb->ip_summed != CHECKSUM_HW);
+
+	skb->mac.raw = skb->data;
+	skb->mac_len = skb->nh.raw - skb->data;
+	__skb_pull(skb, skb->mac_len);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
+		if (ptype->type == type && !ptype->dev &&
ptype->gso_segment) {
+			segs = ptype->gso_segment(skb, sg);
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return segs;
+}
+
+EXPORT_SYMBOL(skb_gso_segment);
+
 /* Take action when hardware reception checksum errors are detected. */
 #ifdef CONFIG_BUG
 void netdev_rx_csum_fault(struct net_device *dev)
@@ -1148,75 +1183,98 @@ static inline int illegal_highdma(struct
 #define illegal_highdma(dev, skb)	(0)
 #endif
 
-/* Keep head the same: replace data */
-int __skb_linearize(struct sk_buff *skb, gfp_t gfp_mask)
-{
-	unsigned int size;
-	u8 *data;
-	long offset;
-	struct skb_shared_info *ninfo;
-	int headerlen = skb->data - skb->head;
-	int expand = (skb->tail + skb->data_len) - skb->end;
-
-	if (skb_shared(skb))
-		BUG();
-
-	if (expand <= 0)
-		expand = 0;
-
-	size = skb->end - skb->head + expand;
-	size = SKB_DATA_ALIGN(size);
-	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
-	if (!data)
-		return -ENOMEM;
-
-	/* Copy entire thing */
-	if (skb_copy_bits(skb, -headerlen, data, headerlen + skb->len))
-		BUG();
-
-	/* Set up shinfo */
-	ninfo = (struct skb_shared_info*)(data + size);
-	atomic_set(&ninfo->dataref, 1);
-	ninfo->tso_size = skb_shinfo(skb)->tso_size;
-	ninfo->tso_segs = skb_shinfo(skb)->tso_segs;
-	ninfo->nr_frags = 0;
-	ninfo->frag_list = NULL;
-
-	/* Offset between the two in bytes */
-	offset = data - skb->head;
-
-	/* Free old data. */
-	skb_release_data(skb);
-
-	skb->head = data;
-	skb->end  = data + size;
-
-	/* Set up new pointers */
-	skb->h.raw   += offset;
-	skb->nh.raw  += offset;
-	skb->mac.raw += offset;
-	skb->tail    += offset;
-	skb->data    += offset;
-
-	/* We are no longer a clone, even if we were. */
-	skb->cloned    = 0;
-
-	skb->tail     += skb->data_len;
-	skb->data_len  = 0;
+struct dev_gso_cb {
+	void (*destructor)(struct sk_buff *skb);
+};
+
+#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
+
+static void dev_gso_skb_destructor(struct sk_buff *skb)
+{
+	struct dev_gso_cb *cb;
+
+	do {
+		struct sk_buff *nskb = skb->next;
+
+		skb->next = nskb->next;
+		nskb->next = NULL;
+		kfree_skb(nskb);
+	} while (skb->next);
+
+	cb = DEV_GSO_CB(skb);
+	if (cb->destructor)
+		cb->destructor(skb);
+}
+
+/**
+ *	dev_gso_segment - Perform emulated hardware segmentation on skb.
+ *	@skb: buffer to segment
+ *
+ *	This function segments the given skb and stores the list of segments
+ *	in skb->next.
+ */
+static int dev_gso_segment(struct sk_buff *skb)
+{
+	struct net_device *dev = skb->dev;
+	struct sk_buff *segs;
+
+	segs = skb_gso_segment(skb, dev->features & NETIF_F_SG &&
+				    !illegal_highdma(dev, skb));
+	if (unlikely(IS_ERR(segs)))
+		return PTR_ERR(segs);
+
+	skb->next = segs;
+	DEV_GSO_CB(skb)->destructor = skb->destructor;
+	skb->destructor = dev_gso_skb_destructor;
+
+	return 0;
+}
+
+int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	if (likely(!skb->next)) {
+		if (netdev_nit)
+			dev_queue_xmit_nit(skb, dev);
+
+		if (!netif_needs_gso(dev, skb))
+			return dev->hard_start_xmit(skb, dev);
+
+		if (unlikely(dev_gso_segment(skb)))
+			goto out_kfree_skb;
+	}
+
+	do {
+		struct sk_buff *nskb = skb->next;
+		int rc;
+
+		skb->next = nskb->next;
+		nskb->next = NULL;
+		rc = dev->hard_start_xmit(nskb, dev);
+		if (unlikely(rc)) {
+			nskb->next = skb->next;
+			skb->next = nskb;
+			return rc;
+		}
+		if (unlikely(netif_queue_stopped(dev) && skb->next))
+			return NETDEV_TX_BUSY;
+	} while (skb->next);
+	
+	skb->destructor = DEV_GSO_CB(skb)->destructor;
+
+out_kfree_skb:
+	kfree_skb(skb);
 	return 0;
 }
 
 #define HARD_TX_LOCK(dev, cpu) {			\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		spin_lock(&dev->xmit_lock);		\
-		dev->xmit_lock_owner = cpu;		\
+		netif_tx_lock(dev);			\
 	}						\
 }
 
 #define HARD_TX_UNLOCK(dev) {				\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		dev->xmit_lock_owner = -1;		\
-		spin_unlock(&dev->xmit_lock);		\
+		netif_tx_unlock(dev);			\
 	}						\
 }
 
@@ -1289,9 +1347,19 @@ int dev_queue_xmit(struct sk_buff *skb)
 	struct Qdisc *q;
 	int rc = -ENOMEM;
 
+ 	/* If a checksum-deferred packet is forwarded to a device that needs a
+ 	 * checksum, correct the pointers and force checksumming.
+ 	 */
+ 	if (skb_checksum_setup(skb))
+ 		goto out_kfree_skb;
+
+	/* GSO will handle the following emulations directly. */
+	if (netif_needs_gso(dev, skb))
+		goto gso;
+
 	if (skb_shinfo(skb)->frag_list &&
 	    !(dev->features & NETIF_F_FRAGLIST) &&
-	    __skb_linearize(skb, GFP_ATOMIC))
+	    __skb_linearize(skb))
 		goto out_kfree_skb;
 
 	/* Fragmented skb is linearized if device does not support SG,
@@ -1300,31 +1368,27 @@ int dev_queue_xmit(struct sk_buff *skb)
 	 */
 	if (skb_shinfo(skb)->nr_frags &&
 	    (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb))
&&
+	    __skb_linearize(skb))
 	    __skb_linearize(skb, GFP_ATOMIC))
 		goto out_kfree_skb;
 
- 	/* If a checksum-deferred packet is forwarded to a device that needs a
- 	 * checksum, correct the pointers and force checksumming.
- 	 */
- 	if(skb_checksum_setup(skb))
- 		goto out_kfree_skb;
-  
 	/* If packet is not checksummed and device does not support
 	 * checksumming for this protocol, complete checksumming here.
 	 */
 	if (skb->ip_summed == CHECKSUM_HW &&
-	    (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) &&
+	    (!(dev->features & NETIF_F_GEN_CSUM) &&
 	     (!(dev->features & NETIF_F_IP_CSUM) ||
 	      skb->protocol != htons(ETH_P_IP))))
 	      	if (skb_checksum_help(skb, 0))
 	      		goto out_kfree_skb;
 
+gso:
 	spin_lock_prefetch(&dev->queue_lock);
 
 	/* Disable soft irqs for various locks below. Also 
 	 * stops preemption for RCU. 
 	 */
-	local_bh_disable(); 
+	rcu_read_lock_bh(); 
 
 	/* Updates of qdisc are serialized by queue_lock. 
 	 * The struct Qdisc which is pointed to by qdisc is now a 
@@ -1358,8 +1422,8 @@ int dev_queue_xmit(struct sk_buff *skb)
 	/* The device has no queue. Common case for software devices:
 	   loopback, all the sorts of tunnels...
 
-	   Really, it is unlikely that xmit_lock protection is necessary here.
-	   (f.e. loopback and IP tunnels are clean ignoring statistics
+	   Really, it is unlikely that netif_tx_lock protection is necessary
+	   here.  (f.e. loopback and IP tunnels are clean ignoring statistics
 	   counters.)
 	   However, it is possible, that they rely on protection
 	   made by us here.
@@ -1375,11 +1439,8 @@ int dev_queue_xmit(struct sk_buff *skb)
 			HARD_TX_LOCK(dev, cpu);
 
 			if (!netif_queue_stopped(dev)) {
-				if (netdev_nit)
-					dev_queue_xmit_nit(skb, dev);
-
 				rc = 0;
-				if (!dev->hard_start_xmit(skb, dev)) {
+				if (!dev_hard_start_xmit(skb, dev)) {
 					HARD_TX_UNLOCK(dev);
 					goto out;
 				}
@@ -1398,13 +1459,13 @@ int dev_queue_xmit(struct sk_buff *skb)
 	}
 
 	rc = -ENETDOWN;
-	local_bh_enable();
+	rcu_read_unlock_bh();
 
 out_kfree_skb:
 	kfree_skb(skb);
 	return rc;
 out:
-	local_bh_enable();
+	rcu_read_unlock_bh();
 	return rc;
 }
 
@@ -2732,7 +2793,7 @@ int register_netdevice(struct net_device
 	BUG_ON(dev->reg_state != NETREG_UNINITIALIZED);
 
 	spin_lock_init(&dev->queue_lock);
-	spin_lock_init(&dev->xmit_lock);
+	spin_lock_init(&dev->_xmit_lock);
 	dev->xmit_lock_owner = -1;
 #ifdef CONFIG_NET_CLS_ACT
 	spin_lock_init(&dev->ingress_lock);
@@ -2776,9 +2837,7 @@ int register_netdevice(struct net_device
 
 	/* Fix illegal SG+CSUM combinations. */
 	if ((dev->features & NETIF_F_SG) &&
-	    !(dev->features & (NETIF_F_IP_CSUM |
-			       NETIF_F_NO_CSUM |
-			       NETIF_F_HW_CSUM))) {
+	    !(dev->features & NETIF_F_ALL_CSUM)) {
 		printk("%s: Dropping NETIF_F_SG since no checksum feature.\n",
 		       dev->name);
 		dev->features &= ~NETIF_F_SG;
@@ -3330,7 +3389,6 @@ EXPORT_SYMBOL(__dev_get_by_index);
 EXPORT_SYMBOL(__dev_get_by_index);
 EXPORT_SYMBOL(__dev_get_by_name);
 EXPORT_SYMBOL(__dev_remove_pack);
-EXPORT_SYMBOL(__skb_linearize);
 EXPORT_SYMBOL(dev_valid_name);
 EXPORT_SYMBOL(dev_add_pack);
 EXPORT_SYMBOL(dev_alloc_name);
diff -r 9ec0b4f10b4f -r b255efd0df72 linux-2.6-xen-sparse/net/core/skbuff.c
--- a/linux-2.6-xen-sparse/net/core/skbuff.c	Sat Jun 24 23:44:18 2006 +0100
+++ b/linux-2.6-xen-sparse/net/core/skbuff.c	Tue Jun 27 21:27:13 2006 +1000
@@ -165,9 +165,9 @@ struct sk_buff *__alloc_skb(unsigned int
 	shinfo = skb_shinfo(skb);
 	atomic_set(&shinfo->dataref, 1);
 	shinfo->nr_frags  = 0;
-	shinfo->tso_size = 0;
-	shinfo->tso_segs = 0;
-	shinfo->ufo_size = 0;
+	shinfo->gso_size = 0;
+	shinfo->gso_segs = 0;
+	shinfo->gso_type = 0;
 	shinfo->ip6_frag_id = 0;
 	shinfo->frag_list = NULL;
 
@@ -237,9 +237,9 @@ struct sk_buff *alloc_skb_from_cache(kme
 	shinfo = skb_shinfo(skb);
 	atomic_set(&shinfo->dataref, 1);
 	shinfo->nr_frags  = 0;
-	shinfo->tso_size = 0;
-	shinfo->tso_segs = 0;
-	shinfo->ufo_size = 0;
+	skb_shinfo(skb)->gso_size = 0;
+	skb_shinfo(skb)->gso_segs = 0;
+	skb_shinfo(skb)->gso_type = 0;
 	shinfo->ip6_frag_id = 0;
 	shinfo->frag_list = NULL;
 
@@ -524,8 +524,9 @@ static void copy_skb_header(struct sk_bu
 	new->tc_index	= old->tc_index;
 #endif
 	atomic_set(&new->users, 1);
-	skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size;
-	skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs;
+	skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
+	skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
+	skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
 }
 
 /**
@@ -1799,6 +1800,132 @@ int skb_append_datato_frags(struct sock 
 
 	return 0;
 }
+
+/**
+ *	skb_segment - Perform protocol segmentation on skb.
+ *	@skb: buffer to segment
+ *	@sg: whether scatter-gather can be used for generated segments
+ *
+ *	This function performs segmentation on the given skb.  It returns
+ *	the segment at the given position.  It returns NULL if there are
+ *	no more segments to generate, or when an error is encountered.
+ */
+struct sk_buff *skb_segment(struct sk_buff *skb, int sg)
+{
+	struct sk_buff *segs = NULL;
+	struct sk_buff *tail = NULL;
+	unsigned int mss = skb_shinfo(skb)->gso_size;
+	unsigned int doffset = skb->data - skb->mac.raw;
+	unsigned int offset = doffset;
+	unsigned int headroom;
+	unsigned int len;
+	int nfrags = skb_shinfo(skb)->nr_frags;
+	int err = -ENOMEM;
+	int i = 0;
+	int pos;
+
+	__skb_push(skb, doffset);
+	headroom = skb_headroom(skb);
+	pos = skb_headlen(skb);
+
+	do {
+		struct sk_buff *nskb;
+		skb_frag_t *frag;
+		int hsize, nsize;
+		int k;
+		int size;
+
+		len = skb->len - offset;
+		if (len > mss)
+			len = mss;
+
+		hsize = skb_headlen(skb) - offset;
+		if (hsize < 0)
+			hsize = 0;
+		nsize = hsize + doffset;
+		if (nsize > len + doffset || !sg)
+			nsize = len + doffset;
+
+		nskb = alloc_skb(nsize + headroom, GFP_ATOMIC);
+		if (unlikely(!nskb))
+			goto err;
+
+		if (segs)
+			tail->next = nskb;
+		else
+			segs = nskb;
+		tail = nskb;
+
+		nskb->dev = skb->dev;
+		nskb->priority = skb->priority;
+		nskb->protocol = skb->protocol;
+		nskb->dst = dst_clone(skb->dst);
+		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+		nskb->pkt_type = skb->pkt_type;
+		nskb->mac_len = skb->mac_len;
+
+		skb_reserve(nskb, headroom);
+		nskb->mac.raw = nskb->data;
+		nskb->nh.raw = nskb->data + skb->mac_len;
+		nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw);
+		memcpy(skb_put(nskb, doffset), skb->data, doffset);
+
+		if (!sg) {
+			nskb->csum = skb_copy_and_csum_bits(skb, offset,
+							    skb_put(nskb, len),
+							    len, 0);
+			continue;
+		}
+
+		frag = skb_shinfo(nskb)->frags;
+		k = 0;
+
+		nskb->ip_summed = CHECKSUM_HW;
+		nskb->csum = skb->csum;
+		memcpy(skb_put(nskb, hsize), skb->data + offset, hsize);
+
+		while (pos < offset + len) {
+			BUG_ON(i >= nfrags);
+
+			*frag = skb_shinfo(skb)->frags[i];
+			get_page(frag->page);
+			size = frag->size;
+
+			if (pos < offset) {
+				frag->page_offset += offset - pos;
+				frag->size -= offset - pos;
+			}
+
+			k++;
+
+			if (pos + size <= offset + len) {
+				i++;
+				pos += size;
+			} else {
+				frag->size -= pos + size - (offset + len);
+				break;
+			}
+
+			frag++;
+		}
+
+		skb_shinfo(nskb)->nr_frags = k;
+		nskb->data_len = len - hsize;
+		nskb->len += nskb->data_len;
+		nskb->truesize += nskb->data_len;
+	} while ((offset += len) < skb->len);
+
+	return segs;
+
+err:
+	while ((skb = segs)) {
+		segs = skb->next;
+		kfree(skb);
+	}
+	return ERR_PTR(err);
+}
+
+EXPORT_SYMBOL_GPL(skb_segment);
 
 void __init skb_init(void)
 {
diff -r 9ec0b4f10b4f -r b255efd0df72 patches/linux-2.6.16.13/net-gso.patch
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/patches/linux-2.6.16.13/net-gso.patch	Tue Jun 27 21:27:13 2006 +1000
@@ -0,0 +1,2285 @@
+diff --git a/Documentation/networking/netdevices.txt
b/Documentation/networking/netdevices.txt
+index 3c0a5ba..847cedb 100644
+--- a/Documentation/networking/netdevices.txt
++++ b/Documentation/networking/netdevices.txt
+@@ -42,9 +42,9 @@ dev->get_stats:
+ 	Context: nominally process, but don''t sleep inside an rwlock
+ 
+ dev->hard_start_xmit:
+-	Synchronization: dev->xmit_lock spinlock.
++	Synchronization: netif_tx_lock spinlock.
+ 	When the driver sets NETIF_F_LLTX in dev->features this will be
+-	called without holding xmit_lock. In this case the driver 
++	called without holding netif_tx_lock. In this case the driver
+ 	has to lock by itself when needed. It is recommended to use a try lock
+ 	for this and return -1 when the spin lock fails. 
+ 	The locking there should also properly protect against 
+@@ -62,12 +62,12 @@ dev->hard_start_xmit:
+ 	  Only valid when NETIF_F_LLTX is set.
+ 
+ dev->tx_timeout:
+-	Synchronization: dev->xmit_lock spinlock.
++	Synchronization: netif_tx_lock spinlock.
+ 	Context: BHs disabled
+ 	Notes: netif_queue_stopped() is guaranteed true
+ 
+ dev->set_multicast_list:
+-	Synchronization: dev->xmit_lock spinlock.
++	Synchronization: netif_tx_lock spinlock.
+ 	Context: BHs disabled
+ 
+ dev->poll:
+diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
+index 4be9769..2e7cac7 100644
+--- a/drivers/block/aoe/aoenet.c
++++ b/drivers/block/aoe/aoenet.c
+@@ -95,9 +95,8 @@ mac_addr(char addr[6])
+ static struct sk_buff *
+ skb_check(struct sk_buff *skb)
+ {
+-	if (skb_is_nonlinear(skb))
+ 	if ((skb = skb_share_check(skb, GFP_ATOMIC)))
+-	if (skb_linearize(skb, GFP_ATOMIC) < 0) {
++	if (skb_linearize(skb)) {
+ 		dev_kfree_skb(skb);
+ 		return NULL;
+ 	}
+diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+index a2408d7..c90e620 100644
+--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
++++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+@@ -821,7 +821,8 @@ void ipoib_mcast_restart_task(void *dev_
+ 
+ 	ipoib_mcast_stop_thread(dev, 0);
+ 
+-	spin_lock_irqsave(&dev->xmit_lock, flags);
++	local_irq_save(flags);
++	netif_tx_lock(dev);
+ 	spin_lock(&priv->lock);
+ 
+ 	/*
+@@ -896,7 +897,8 @@ void ipoib_mcast_restart_task(void *dev_
+ 	}
+ 
+ 	spin_unlock(&priv->lock);
+-	spin_unlock_irqrestore(&dev->xmit_lock, flags);
++	netif_tx_unlock(dev);
++	local_irq_restore(flags);
+ 
+ 	/* We have to cancel outside of the spinlock */
+ 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
+diff --git a/drivers/media/dvb/dvb-core/dvb_net.c
b/drivers/media/dvb/dvb-core/dvb_net.c
+index 6711eb6..8d2351f 100644
+--- a/drivers/media/dvb/dvb-core/dvb_net.c
++++ b/drivers/media/dvb/dvb-core/dvb_net.c
+@@ -1052,7 +1052,7 @@ static void wq_set_multicast_list (void 
+ 
+ 	dvb_net_feed_stop(dev);
+ 	priv->rx_mode = RX_MODE_UNI;
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 
+ 	if (dev->flags & IFF_PROMISC) {
+ 		dprintk("%s: promiscuous mode\n", dev->name);
+@@ -1077,7 +1077,7 @@ static void wq_set_multicast_list (void 
+ 		}
+ 	}
+ 
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	dvb_net_feed_start(dev);
+ }
+ 
+diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
+index dd41049..6615583 100644
+--- a/drivers/net/8139cp.c
++++ b/drivers/net/8139cp.c
+@@ -794,7 +794,7 @@ #endif
+ 	entry = cp->tx_head;
+ 	eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
+ 	if (dev->features & NETIF_F_TSO)
+-		mss = skb_shinfo(skb)->tso_size;
++		mss = skb_shinfo(skb)->gso_size;
+ 
+ 	if (skb_shinfo(skb)->nr_frags == 0) {
+ 		struct cp_desc *txd = &cp->tx_ring[entry];
+diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
+index a24200d..b5e39a1 100644
+--- a/drivers/net/bnx2.c
++++ b/drivers/net/bnx2.c
+@@ -1593,7 +1593,7 @@ bnx2_tx_int(struct bnx2 *bp)
+ 		skb = tx_buf->skb;
+ #ifdef BCM_TSO 
+ 		/* partial BD completions possible with TSO packets */
+-		if (skb_shinfo(skb)->tso_size) {
++		if (skb_shinfo(skb)->gso_size) {
+ 			u16 last_idx, last_ring_idx;
+ 
+ 			last_idx = sw_cons +
+@@ -1948,7 +1948,7 @@ bnx2_poll(struct net_device *dev, int *b
+ 	return 1;
+ }
+ 
+-/* Called with rtnl_lock from vlan functions and also dev->xmit_lock
++/* Called with rtnl_lock from vlan functions and also netif_tx_lock
+  * from set_multicast.
+  */
+ static void
+@@ -4403,7 +4403,7 @@ bnx2_vlan_rx_kill_vid(struct net_device 
+ }
+ #endif
+ 
+-/* Called with dev->xmit_lock.
++/* Called with netif_tx_lock.
+  * hard_start_xmit is pseudo-lockless - a lock is only required when
+  * the tx queue is full. This way, we get the benefit of lockless
+  * operations most of the time without the complexities to handle
+@@ -4441,7 +4441,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
+ 			(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
+ 	}
+ #ifdef BCM_TSO 
+-	if ((mss = skb_shinfo(skb)->tso_size) &&
++	if ((mss = skb_shinfo(skb)->gso_size) &&
+ 		(skb->len > (bp->dev->mtu + ETH_HLEN))) {
+ 		u32 tcp_opt_len, ip_tcp_len;
+ 
+diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
+index bcf9f17..e970921 100644
+--- a/drivers/net/bonding/bond_main.c
++++ b/drivers/net/bonding/bond_main.c
+@@ -1145,8 +1145,7 @@ int bond_sethwaddr(struct net_device *bo
+ }
+ 
+ #define BOND_INTERSECT_FEATURES \
+-	(NETIF_F_SG|NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM|\
+-	NETIF_F_TSO|NETIF_F_UFO)
++	(NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO)
+ 
+ /* 
+  * Compute the common dev->feature set available to all slaves.  Some
+@@ -1164,9 +1163,7 @@ static int bond_compute_features(struct 
+ 		features &= (slave->dev->features & BOND_INTERSECT_FEATURES);
+ 
+ 	if ((features & NETIF_F_SG) && 
+-	    !(features & (NETIF_F_IP_CSUM |
+-			  NETIF_F_NO_CSUM |
+-			  NETIF_F_HW_CSUM)))
++	    !(features & NETIF_F_ALL_CSUM))
+ 		features &= ~NETIF_F_SG;
+ 
+ 	/* 
+@@ -4147,7 +4144,7 @@ static int bond_init(struct net_device *
+ 	 */
+ 	bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
+ 
+-	/* don''t acquire bond device''s xmit_lock when 
++	/* don''t acquire bond device''s netif_tx_lock when
+ 	 * transmitting */
+ 	bond_dev->features |= NETIF_F_LLTX;
+ 
+diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
+index 30ff8ea..7b7d360 100644
+--- a/drivers/net/chelsio/sge.c
++++ b/drivers/net/chelsio/sge.c
+@@ -1419,7 +1419,7 @@ int t1_start_xmit(struct sk_buff *skb, s
+ 	struct cpl_tx_pkt *cpl;
+ 
+ #ifdef NETIF_F_TSO
+-	if (skb_shinfo(skb)->tso_size) {
++	if (skb_shinfo(skb)->gso_size) {
+ 		int eth_type;
+ 		struct cpl_tx_pkt_lso *hdr;
+ 
+@@ -1434,7 +1434,7 @@ #ifdef NETIF_F_TSO
+ 		hdr->ip_hdr_words = skb->nh.iph->ihl;
+ 		hdr->tcp_hdr_words = skb->h.th->doff;
+ 		hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
+-						skb_shinfo(skb)->tso_size));
++						skb_shinfo(skb)->gso_size));
+ 		hdr->len = htonl(skb->len - sizeof(*hdr));
+ 		cpl = (struct cpl_tx_pkt *)hdr;
+ 		sge->stats.tx_lso_pkts++;
+diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
+index fa29402..681d284 100644
+--- a/drivers/net/e1000/e1000_main.c
++++ b/drivers/net/e1000/e1000_main.c
+@@ -2526,7 +2526,7 @@ #ifdef NETIF_F_TSO
+ 	uint8_t ipcss, ipcso, tucss, tucso, hdr_len;
+ 	int err;
+ 
+-	if (skb_shinfo(skb)->tso_size) {
++	if (skb_shinfo(skb)->gso_size) {
+ 		if (skb_header_cloned(skb)) {
+ 			err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+ 			if (err)
+@@ -2534,7 +2534,7 @@ #ifdef NETIF_F_TSO
+ 		}
+ 
+ 		hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff <<
2));
+-		mss = skb_shinfo(skb)->tso_size;
++		mss = skb_shinfo(skb)->gso_size;
+ 		if (skb->protocol == ntohs(ETH_P_IP)) {
+ 			skb->nh.iph->tot_len = 0;
+ 			skb->nh.iph->check = 0;
+@@ -2651,7 +2651,7 @@ #ifdef NETIF_F_TSO
+ 		 * tso gets written back prematurely before the data is fully
+ 		 * DMAd to the controller */
+ 		if (!skb->data_len && tx_ring->last_tx_tso &&
+-				!skb_shinfo(skb)->tso_size) {
++				!skb_shinfo(skb)->gso_size) {
+ 			tx_ring->last_tx_tso = 0;
+ 			size -= 4;
+ 		}
+@@ -2893,7 +2893,7 @@ #endif
+ 	}
+ 
+ #ifdef NETIF_F_TSO
+-	mss = skb_shinfo(skb)->tso_size;
++	mss = skb_shinfo(skb)->gso_size;
+ 	/* The controller does a simple calculation to 
+ 	 * make sure there is enough room in the FIFO before
+ 	 * initiating the DMA for each buffer.  The calc is:
+@@ -2935,7 +2935,7 @@ #endif
+ #ifdef NETIF_F_TSO
+ 	/* Controller Erratum workaround */
+ 	if (!skb->data_len && tx_ring->last_tx_tso &&
+-		!skb_shinfo(skb)->tso_size)
++		!skb_shinfo(skb)->gso_size)
+ 		count++;
+ #endif
+ 
+diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
+index 3682ec6..c35f16e 100644
+--- a/drivers/net/forcedeth.c
++++ b/drivers/net/forcedeth.c
+@@ -482,9 +482,9 @@ #define LPA_1000HALF	0x0400
+  * critical parts:
+  * - rx is (pseudo-) lockless: it relies on the single-threading provided
+  *	by the arch code for interrupts.
+- * - tx setup is lockless: it relies on dev->xmit_lock. Actual submission
++ * - tx setup is lockless: it relies on netif_tx_lock. Actual submission
+  *	needs dev->priv->lock :-(
+- * - set_multicast_list: preparation lockless, relies on dev->xmit_lock.
++ * - set_multicast_list: preparation lockless, relies on netif_tx_lock.
+  */
+ 
+ /* in dev: base, irq */
+@@ -1016,7 +1016,7 @@ static void drain_ring(struct net_device
+ 
+ /*
+  * nv_start_xmit: dev->hard_start_xmit function
+- * Called with dev->xmit_lock held.
++ * Called with netif_tx_lock held.
+  */
+ static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
+ {
+@@ -1105,8 +1105,8 @@ static int nv_start_xmit(struct sk_buff 
+ 	np->tx_skbuff[nr] = skb;
+ 
+ #ifdef NETIF_F_TSO
+-	if (skb_shinfo(skb)->tso_size)
+-		tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->tso_size <<
NV_TX2_TSO_SHIFT);
++	if (skb_shinfo(skb)->gso_size)
++		tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->gso_size <<
NV_TX2_TSO_SHIFT);
+ 	else
+ #endif
+ 	tx_flags_extra = (skb->ip_summed == CHECKSUM_HW ?
(NV_TX2_CHECKSUM_L3|NV_TX2_CHECKSUM_L4) : 0);
+@@ -1203,7 +1203,7 @@ static void nv_tx_done(struct net_device
+ 
+ /*
+  * nv_tx_timeout: dev->tx_timeout function
+- * Called with dev->xmit_lock held.
++ * Called with netif_tx_lock held.
+  */
+ static void nv_tx_timeout(struct net_device *dev)
+ {
+@@ -1524,7 +1524,7 @@ static int nv_change_mtu(struct net_devi
+ 		 * Changing the MTU is a rare event, it shouldn''t matter.
+ 		 */
+ 		disable_irq(dev->irq);
+-		spin_lock_bh(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		spin_lock(&np->lock);
+ 		/* stop engines */
+ 		nv_stop_rx(dev);
+@@ -1559,7 +1559,7 @@ static int nv_change_mtu(struct net_devi
+ 		nv_start_rx(dev);
+ 		nv_start_tx(dev);
+ 		spin_unlock(&np->lock);
+-		spin_unlock_bh(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 		enable_irq(dev->irq);
+ 	}
+ 	return 0;
+@@ -1594,7 +1594,7 @@ static int nv_set_mac_address(struct net
+ 	memcpy(dev->dev_addr, macaddr->sa_data, ETH_ALEN);
+ 
+ 	if (netif_running(dev)) {
+-		spin_lock_bh(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		spin_lock_irq(&np->lock);
+ 
+ 		/* stop rx engine */
+@@ -1606,7 +1606,7 @@ static int nv_set_mac_address(struct net
+ 		/* restart rx engine */
+ 		nv_start_rx(dev);
+ 		spin_unlock_irq(&np->lock);
+-		spin_unlock_bh(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 	} else {
+ 		nv_copy_mac_to_hw(dev);
+ 	}
+@@ -1615,7 +1615,7 @@ static int nv_set_mac_address(struct net
+ 
+ /*
+  * nv_set_multicast: dev->set_multicast function
+- * Called with dev->xmit_lock held.
++ * Called with netif_tx_lock held.
+  */
+ static void nv_set_multicast(struct net_device *dev)
+ {
+diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c
+index 102c1f0..d12605f 100644
+--- a/drivers/net/hamradio/6pack.c
++++ b/drivers/net/hamradio/6pack.c
+@@ -308,9 +308,9 @@ static int sp_set_mac_address(struct net
+ {
+ 	struct sockaddr_ax25 *sa = addr;
+ 
+-	spin_lock_irq(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN);
+-	spin_unlock_irq(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 
+ 	return 0;
+ }
+@@ -767,9 +767,9 @@ static int sixpack_ioctl(struct tty_stru
+ 			break;
+ 		}
+ 
+-		spin_lock_irq(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		memcpy(dev->dev_addr, &addr, AX25_ADDR_LEN);
+-		spin_unlock_irq(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 
+ 		err = 0;
+ 		break;
+diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
+index dc5e9d5..5c66f5a 100644
+--- a/drivers/net/hamradio/mkiss.c
++++ b/drivers/net/hamradio/mkiss.c
+@@ -357,9 +357,9 @@ static int ax_set_mac_address(struct net
+ {
+ 	struct sockaddr_ax25 *sa = addr;
+ 
+-	spin_lock_irq(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN);
+-	spin_unlock_irq(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 
+ 	return 0;
+ }
+@@ -886,9 +886,9 @@ static int mkiss_ioctl(struct tty_struct
+ 			break;
+ 		}
+ 
+-		spin_lock_irq(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		memcpy(dev->dev_addr, addr, AX25_ADDR_LEN);
+-		spin_unlock_irq(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 
+ 		err = 0;
+ 		break;
+diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
+index 31fb2d7..2e222ef 100644
+--- a/drivers/net/ifb.c
++++ b/drivers/net/ifb.c
+@@ -76,13 +76,13 @@ static void ri_tasklet(unsigned long dev
+ 	dp->st_task_enter++;
+ 	if ((skb = skb_peek(&dp->tq)) == NULL) {
+ 		dp->st_txq_refl_try++;
+-		if (spin_trylock(&_dev->xmit_lock)) {
++		if (netif_tx_trylock(_dev)) {
+ 			dp->st_rxq_enter++;
+ 			while ((skb = skb_dequeue(&dp->rq)) != NULL) {
+ 				skb_queue_tail(&dp->tq, skb);
+ 				dp->st_rx2tx_tran++;
+ 			}
+-			spin_unlock(&_dev->xmit_lock);
++			netif_tx_unlock(_dev);
+ 		} else {
+ 			/* reschedule */
+ 			dp->st_rxq_notenter++;
+@@ -110,7 +110,7 @@ static void ri_tasklet(unsigned long dev
+ 		}
+ 	}
+ 
+-	if (spin_trylock(&_dev->xmit_lock)) {
++	if (netif_tx_trylock(_dev)) {
+ 		dp->st_rxq_check++;
+ 		if ((skb = skb_peek(&dp->rq)) == NULL) {
+ 			dp->tasklet_pending = 0;
+@@ -118,10 +118,10 @@ static void ri_tasklet(unsigned long dev
+ 				netif_wake_queue(_dev);
+ 		} else {
+ 			dp->st_rxq_rsch++;
+-			spin_unlock(&_dev->xmit_lock);
++			netif_tx_unlock(_dev);
+ 			goto resched;
+ 		}
+-		spin_unlock(&_dev->xmit_lock);
++		netif_tx_unlock(_dev);
+ 	} else {
+ resched:
+ 		dp->tasklet_pending = 1;
+diff --git a/drivers/net/irda/vlsi_ir.c b/drivers/net/irda/vlsi_ir.c
+index a9f49f0..339d4a7 100644
+--- a/drivers/net/irda/vlsi_ir.c
++++ b/drivers/net/irda/vlsi_ir.c
+@@ -959,7 +959,7 @@ static int vlsi_hard_start_xmit(struct s
+ 			    ||  (now.tv_sec==ready.tv_sec &&
now.tv_usec>=ready.tv_usec))
+ 			    	break;
+ 			udelay(100);
+-			/* must not sleep here - we are called under xmit_lock! */
++			/* must not sleep here - called under netif_tx_lock! */
+ 		}
+ 	}
+ 
+diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
+index f9f77e4..bdab369 100644
+--- a/drivers/net/ixgb/ixgb_main.c
++++ b/drivers/net/ixgb/ixgb_main.c
+@@ -1163,7 +1163,7 @@ #ifdef NETIF_F_TSO
+ 	uint16_t ipcse, tucse, mss;
+ 	int err;
+ 
+-	if(likely(skb_shinfo(skb)->tso_size)) {
++	if(likely(skb_shinfo(skb)->gso_size)) {
+ 		if (skb_header_cloned(skb)) {
+ 			err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+ 			if (err)
+@@ -1171,7 +1171,7 @@ #ifdef NETIF_F_TSO
+ 		}
+ 
+ 		hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff <<
2));
+-		mss = skb_shinfo(skb)->tso_size;
++		mss = skb_shinfo(skb)->gso_size;
+ 		skb->nh.iph->tot_len = 0;
+ 		skb->nh.iph->check = 0;
+ 		skb->h.th->check = ~csum_tcpudp_magic(skb->nh.iph->saddr,
+diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
+index 690a1aa..9bcaa80 100644
+--- a/drivers/net/loopback.c
++++ b/drivers/net/loopback.c
+@@ -74,7 +74,7 @@ static void emulate_large_send_offload(s
+ 	struct iphdr *iph = skb->nh.iph;
+ 	struct tcphdr *th = (struct tcphdr*)(skb->nh.raw + (iph->ihl * 4));
+ 	unsigned int doffset = (iph->ihl + th->doff) * 4;
+-	unsigned int mtu = skb_shinfo(skb)->tso_size + doffset;
++	unsigned int mtu = skb_shinfo(skb)->gso_size + doffset;
+ 	unsigned int offset = 0;
+ 	u32 seq = ntohl(th->seq);
+ 	u16 id  = ntohs(iph->id);
+@@ -139,7 +139,7 @@ #ifndef LOOPBACK_MUST_CHECKSUM
+ #endif
+ 
+ #ifdef LOOPBACK_TSO
+-	if (skb_shinfo(skb)->tso_size) {
++	if (skb_shinfo(skb)->gso_size) {
+ 		BUG_ON(skb->protocol != htons(ETH_P_IP));
+ 		BUG_ON(skb->nh.iph->protocol != IPPROTO_TCP);
+ 
+diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
+index c0998ef..0fac9d5 100644
+--- a/drivers/net/mv643xx_eth.c
++++ b/drivers/net/mv643xx_eth.c
+@@ -1107,7 +1107,7 @@ static int mv643xx_eth_start_xmit(struct
+ 
+ #ifdef MV643XX_CHECKSUM_OFFLOAD_TX
+ 	if (has_tiny_unaligned_frags(skb)) {
+-		if ((skb_linearize(skb, GFP_ATOMIC) != 0)) {
++		if (__skb_linearize(skb)) {
+ 			stats->tx_dropped++;
+ 			printk(KERN_DEBUG "%s: failed to linearize tiny "
+ 					"unaligned fragment\n", dev->name);
+diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c
+index 9d6d254..c9ed624 100644
+--- a/drivers/net/natsemi.c
++++ b/drivers/net/natsemi.c
+@@ -323,12 +323,12 @@ performance critical codepaths:
+ The rx process only runs in the interrupt handler. Access from outside
+ the interrupt handler is only permitted after disable_irq().
+ 
+-The rx process usually runs under the dev->xmit_lock. If
np->intr_tx_reap
++The rx process usually runs under the netif_tx_lock. If np->intr_tx_reap
+ is set, then access is permitted under spin_lock_irq(&np->lock).
+ 
+ Thus configuration functions that want to access everything must call
+ 	disable_irq(dev->irq);
+-	spin_lock_bh(dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	spin_lock_irq(&np->lock);
+ 
+ IV. Notes
+diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
+index 8cc0d0b..e53b313 100644
+--- a/drivers/net/r8169.c
++++ b/drivers/net/r8169.c
+@@ -2171,7 +2171,7 @@ static int rtl8169_xmit_frags(struct rtl
+ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device
*dev)
+ {
+ 	if (dev->features & NETIF_F_TSO) {
+-		u32 mss = skb_shinfo(skb)->tso_size;
++		u32 mss = skb_shinfo(skb)->gso_size;
+ 
+ 		if (mss)
+ 			return LargeSend | ((mss & MSSMask) << MSSShift);
+diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
+index b7f00d6..439f45f 100644
+--- a/drivers/net/s2io.c
++++ b/drivers/net/s2io.c
+@@ -3522,8 +3522,8 @@ #endif
+ 	txdp->Control_1 = 0;
+ 	txdp->Control_2 = 0;
+ #ifdef NETIF_F_TSO
+-	mss = skb_shinfo(skb)->tso_size;
+-	if (mss) {
++	mss = skb_shinfo(skb)->gso_size;
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4) {
+ 		txdp->Control_1 |= TXD_TCP_LSO_EN;
+ 		txdp->Control_1 |= TXD_TCP_LSO_MSS(mss);
+ 	}
+@@ -3543,10 +3543,10 @@ #endif
+ 	}
+ 
+ 	frg_len = skb->len - skb->data_len;
+-	if (skb_shinfo(skb)->ufo_size) {
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) {
+ 		int ufo_size;
+ 
+-		ufo_size = skb_shinfo(skb)->ufo_size;
++		ufo_size = skb_shinfo(skb)->gso_size;
+ 		ufo_size &= ~7;
+ 		txdp->Control_1 |= TXD_UFO_EN;
+ 		txdp->Control_1 |= TXD_UFO_MSS(ufo_size);
+@@ -3572,7 +3572,7 @@ #endif
+ 	txdp->Host_Control = (unsigned long) skb;
+ 	txdp->Control_1 |= TXD_BUFFER0_SIZE(frg_len);
+ 
+-	if (skb_shinfo(skb)->ufo_size)
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 		txdp->Control_1 |= TXD_UFO_EN;
+ 
+ 	frg_cnt = skb_shinfo(skb)->nr_frags;
+@@ -3587,12 +3587,12 @@ #endif
+ 		    (sp->pdev, frag->page, frag->page_offset,
+ 		     frag->size, PCI_DMA_TODEVICE);
+ 		txdp->Control_1 = TXD_BUFFER0_SIZE(frag->size);
+-		if (skb_shinfo(skb)->ufo_size)
++		if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 			txdp->Control_1 |= TXD_UFO_EN;
+ 	}
+ 	txdp->Control_1 |= TXD_GATHER_CODE_LAST;
+ 
+-	if (skb_shinfo(skb)->ufo_size)
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 		frg_cnt++; /* as Txd0 was used for inband header */
+ 
+ 	tx_fifo = mac_control->tx_FIFO_start[queue];
+@@ -3606,7 +3606,7 @@ #ifdef NETIF_F_TSO
+ 	if (mss)
+ 		val64 |= TX_FIFO_SPECIAL_FUNC;
+ #endif
+-	if (skb_shinfo(skb)->ufo_size)
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 		val64 |= TX_FIFO_SPECIAL_FUNC;
+ 	writeq(val64, &tx_fifo->List_Control);
+ 
+diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
+index 0618cd5..2a55eb3 100644
+--- a/drivers/net/sky2.c
++++ b/drivers/net/sky2.c
+@@ -1125,7 +1125,7 @@ static unsigned tx_le_req(const struct s
+ 	count = sizeof(dma_addr_t) / sizeof(u32);
+ 	count += skb_shinfo(skb)->nr_frags * count;
+ 
+-	if (skb_shinfo(skb)->tso_size)
++	if (skb_shinfo(skb)->gso_size)
+ 		++count;
+ 
+ 	if (skb->ip_summed == CHECKSUM_HW)
+@@ -1197,7 +1197,7 @@ static int sky2_xmit_frame(struct sk_buf
+ 	}
+ 
+ 	/* Check for TCP Segmentation Offload */
+-	mss = skb_shinfo(skb)->tso_size;
++	mss = skb_shinfo(skb)->gso_size;
+ 	if (mss != 0) {
+ 		/* just drop the packet if non-linear expansion fails */
+ 		if (skb_header_cloned(skb) &&
+diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
+index caf4102..fc9164a 100644
+--- a/drivers/net/tg3.c
++++ b/drivers/net/tg3.c
+@@ -3664,7 +3664,7 @@ static int tg3_start_xmit(struct sk_buff
+ #if TG3_TSO_SUPPORT != 0
+ 	mss = 0;
+ 	if (skb->len > (tp->dev->mtu + ETH_HLEN) &&
+-	    (mss = skb_shinfo(skb)->tso_size) != 0) {
++	    (mss = skb_shinfo(skb)->gso_size) != 0) {
+ 		int tcp_opt_len, ip_tcp_len;
+ 
+ 		if (skb_header_cloned(skb) &&
+diff --git a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c
+index 5b1af39..11de5af 100644
+--- a/drivers/net/tulip/winbond-840.c
++++ b/drivers/net/tulip/winbond-840.c
+@@ -1605,11 +1605,11 @@ #ifdef CONFIG_PM
+  * - get_stats:
+  * 	spin_lock_irq(np->lock), doesn''t touch hw if not present
+  * - hard_start_xmit:
+- * 	netif_stop_queue + spin_unlock_wait(&dev->xmit_lock);
++ * 	synchronize_irq + netif_tx_disable;
+  * - tx_timeout:
+- * 	netif_device_detach + spin_unlock_wait(&dev->xmit_lock);
++ * 	netif_device_detach + netif_tx_disable;
+  * - set_multicast_list
+- * 	netif_device_detach + spin_unlock_wait(&dev->xmit_lock);
++ * 	netif_device_detach + netif_tx_disable;
+  * - interrupt handler
+  * 	doesn''t touch hw if not present, synchronize_irq waits for
+  * 	running instances of the interrupt handler.
+@@ -1635,11 +1635,10 @@ static int w840_suspend (struct pci_dev 
+ 		netif_device_detach(dev);
+ 		update_csr6(dev, 0);
+ 		iowrite32(0, ioaddr + IntrEnable);
+-		netif_stop_queue(dev);
+ 		spin_unlock_irq(&np->lock);
+ 
+-		spin_unlock_wait(&dev->xmit_lock);
+ 		synchronize_irq(dev->irq);
++		netif_tx_disable(dev);
+ 	
+ 		np->stats.rx_missed_errors += ioread32(ioaddr + RxMissed) & 0xffff;
+ 
+diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c
+index 4c76cb7..30c48c9 100644
+--- a/drivers/net/typhoon.c
++++ b/drivers/net/typhoon.c
+@@ -340,7 +340,7 @@ #define typhoon_synchronize_irq(x) synch
+ #endif
+ 
+ #if defined(NETIF_F_TSO)
+-#define skb_tso_size(x)		(skb_shinfo(x)->tso_size)
++#define skb_tso_size(x)		(skb_shinfo(x)->gso_size)
+ #define TSO_NUM_DESCRIPTORS	2
+ #define TSO_OFFLOAD_ON		TYPHOON_OFFLOAD_TCP_SEGMENT
+ #else
+diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
+index ed1f837..2eb6b5f 100644
+--- a/drivers/net/via-velocity.c
++++ b/drivers/net/via-velocity.c
+@@ -1899,6 +1899,13 @@ static int velocity_xmit(struct sk_buff 
+ 
+ 	int pktlen = skb->len;
+ 
++#ifdef VELOCITY_ZERO_COPY_SUPPORT
++	if (skb_shinfo(skb)->nr_frags > 6 && __skb_linearize(skb)) {
++		kfree_skb(skb);
++		return 0;
++	}
++#endif
++
+ 	spin_lock_irqsave(&vptr->lock, flags);
+ 
+ 	index = vptr->td_curr[qnum];
+@@ -1914,8 +1921,6 @@ static int velocity_xmit(struct sk_buff 
+ 	 */
+ 	if (pktlen < ETH_ZLEN) {
+ 		/* Cannot occur until ZC support */
+-		if(skb_linearize(skb, GFP_ATOMIC))
+-			return 0; 
+ 		pktlen = ETH_ZLEN;
+ 		memcpy(tdinfo->buf, skb->data, skb->len);
+ 		memset(tdinfo->buf + skb->len, 0, ETH_ZLEN - skb->len);
+@@ -1933,7 +1938,6 @@ #ifdef VELOCITY_ZERO_COPY_SUPPORT
+ 		int nfrags = skb_shinfo(skb)->nr_frags;
+ 		tdinfo->skb = skb;
+ 		if (nfrags > 6) {
+-			skb_linearize(skb, GFP_ATOMIC);
+ 			memcpy(tdinfo->buf, skb->data, skb->len);
+ 			tdinfo->skb_dma[0] = tdinfo->buf_dma;
+ 			td_ptr->tdesc0.pktsize = 
+diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c
+index 6fd0bf7..75237c1 100644
+--- a/drivers/net/wireless/orinoco.c
++++ b/drivers/net/wireless/orinoco.c
+@@ -1835,7 +1835,9 @@ static int __orinoco_program_rids(struct
+ 	/* Set promiscuity / multicast*/
+ 	priv->promiscuous = 0;
+ 	priv->mc_count = 0;
+-	__orinoco_set_multicast_list(dev); /* FIXME: what about the xmit_lock */
++
++	/* FIXME: what about netif_tx_lock */
++	__orinoco_set_multicast_list(dev);
+ 
+ 	return 0;
+ }
+diff --git a/drivers/s390/net/qeth_eddp.c b/drivers/s390/net/qeth_eddp.c
+index 82cb4af..57cec40 100644
+--- a/drivers/s390/net/qeth_eddp.c
++++ b/drivers/s390/net/qeth_eddp.c
+@@ -421,7 +421,7 @@ #endif /* CONFIG_QETH_VLAN */
+        }
+ 	tcph = eddp->skb->h.th;
+ 	while (eddp->skb_offset < eddp->skb->len) {
+-		data_len = min((int)skb_shinfo(eddp->skb)->tso_size,
++		data_len = min((int)skb_shinfo(eddp->skb)->gso_size,
+ 			       (int)(eddp->skb->len - eddp->skb_offset));
+ 		/* prepare qdio hdr */
+ 		if (eddp->qh.hdr.l2.id == QETH_HEADER_TYPE_LAYER2){
+@@ -516,20 +516,20 @@ qeth_eddp_calc_num_pages(struct qeth_edd
+ 	
+ 	QETH_DBF_TEXT(trace, 5, "eddpcanp");
+ 	/* can we put multiple skbs in one page? */
+-	skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->tso_size + hdr_len);
++	skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->gso_size + hdr_len);
+ 	if (skbs_per_page > 1){
+-		ctx->num_pages = (skb_shinfo(skb)->tso_segs + 1) /
++		ctx->num_pages = (skb_shinfo(skb)->gso_segs + 1) /
+ 				 skbs_per_page + 1;
+ 		ctx->elements_per_skb = 1;
+ 	} else {
+ 		/* no -> how many elements per skb? */
+-		ctx->elements_per_skb = (skb_shinfo(skb)->tso_size + hdr_len +
++		ctx->elements_per_skb = (skb_shinfo(skb)->gso_size + hdr_len +
+ 				     PAGE_SIZE) >> PAGE_SHIFT;
+ 		ctx->num_pages = ctx->elements_per_skb *
+-				 (skb_shinfo(skb)->tso_segs + 1);
++				 (skb_shinfo(skb)->gso_segs + 1);
+ 	}
+ 	ctx->num_elements = ctx->elements_per_skb *
+-			    (skb_shinfo(skb)->tso_segs + 1);
++			    (skb_shinfo(skb)->gso_segs + 1);
+ }
+ 
+ static inline struct qeth_eddp_context *
+diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c
+index dba7f7f..d9cc997 100644
+--- a/drivers/s390/net/qeth_main.c
++++ b/drivers/s390/net/qeth_main.c
+@@ -4454,7 +4454,7 @@ qeth_send_packet(struct qeth_card *card,
+ 	queue = card->qdio.out_qs
+ 		[qeth_get_priority_queue(card, skb, ipv, cast_type)];
+ 
+-	if (skb_shinfo(skb)->tso_size)
++	if (skb_shinfo(skb)->gso_size)
+ 		large_send = card->options.large_send;
+ 
+ 	/*are we able to do TSO ? If so ,prepare and send it from here */
+@@ -4501,7 +4501,7 @@ qeth_send_packet(struct qeth_card *card,
+ 		card->stats.tx_packets++;
+ 		card->stats.tx_bytes += skb->len;
+ #ifdef CONFIG_QETH_PERF_STATS
+-		if (skb_shinfo(skb)->tso_size &&
++		if (skb_shinfo(skb)->gso_size &&
+ 		   !(large_send == QETH_LARGE_SEND_NO)) {
+ 			card->perf_stats.large_send_bytes += skb->len;
+ 			card->perf_stats.large_send_cnt++;
+diff --git a/drivers/s390/net/qeth_tso.h b/drivers/s390/net/qeth_tso.h
+index 1286dde..89cbf34 100644
+--- a/drivers/s390/net/qeth_tso.h
++++ b/drivers/s390/net/qeth_tso.h
+@@ -51,7 +51,7 @@ qeth_tso_fill_header(struct qeth_card *c
+ 	hdr->ext.hdr_version = 1;
+ 	hdr->ext.hdr_len     = 28;
+ 	/*insert non-fix values */
+-	hdr->ext.mss = skb_shinfo(skb)->tso_size;
++	hdr->ext.mss = skb_shinfo(skb)->gso_size;
+ 	hdr->ext.dg_hdr_len = (__u16)(iph->ihl*4 + tcph->doff*4);
+ 	hdr->ext.payload_len = (__u16)(skb->len - hdr->ext.dg_hdr_len -
+ 				       sizeof(struct qeth_hdr_tso));
+diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
+index 93535f0..9269df7 100644
+--- a/include/linux/ethtool.h
++++ b/include/linux/ethtool.h
+@@ -408,6 +408,8 @@ #define ETHTOOL_STSO		0x0000001f /* Set 
+ #define ETHTOOL_GPERMADDR	0x00000020 /* Get permanent hardware address */
+ #define ETHTOOL_GUFO		0x00000021 /* Get UFO enable (ethtool_value) */
+ #define ETHTOOL_SUFO		0x00000022 /* Set UFO enable (ethtool_value) */
++#define ETHTOOL_GGSO		0x00000023 /* Get GSO enable (ethtool_value) */
++#define ETHTOOL_SGSO		0x00000024 /* Set GSO enable (ethtool_value) */
+ 
+ /* compatibility with older code */
+ #define SPARC_ETH_GSET		ETHTOOL_GSET
+diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
+index 7fda03d..458b278 100644
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -230,7 +230,8 @@ enum netdev_state_t
+ 	__LINK_STATE_SCHED,
+ 	__LINK_STATE_NOCARRIER,
+ 	__LINK_STATE_RX_SCHED,
+-	__LINK_STATE_LINKWATCH_PENDING
++	__LINK_STATE_LINKWATCH_PENDING,
++	__LINK_STATE_QDISC_RUNNING,
+ };
+ 
+ 
+@@ -306,9 +307,16 @@ #define NETIF_F_HW_VLAN_TX	128	/* Transm
+ #define NETIF_F_HW_VLAN_RX	256	/* Receive VLAN hw acceleration */
+ #define NETIF_F_HW_VLAN_FILTER	512	/* Receive filtering on VLAN */
+ #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
+-#define NETIF_F_TSO		2048	/* Can offload TCP/IP segmentation */
++#define NETIF_F_GSO		2048	/* Enable software GSO. */
+ #define NETIF_F_LLTX		4096	/* LockLess TX */
+-#define NETIF_F_UFO             8192    /* Can offload UDP Large Send*/
++
++	/* Segmentation offload features */
++#define NETIF_F_GSO_SHIFT	16
++#define NETIF_F_TSO		(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
++#define NETIF_F_UFO		(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
++
++#define NETIF_F_GEN_CSUM	(NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
++#define NETIF_F_ALL_CSUM	(NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
+ 
+ 	struct net_device	*next_sched;
+ 
+@@ -394,6 +402,9 @@ #define NETIF_F_UFO             8192    
+ 	struct list_head	qdisc_list;
+ 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
+ 
++	/* Partially transmitted GSO packet. */
++	struct sk_buff		*gso_skb;
++
+ 	/* ingress path synchronizer */
+ 	spinlock_t		ingress_lock;
+ 	struct Qdisc		*qdisc_ingress;
+@@ -402,7 +413,7 @@ #define NETIF_F_UFO             8192    
+  * One part is mostly used on xmit path (device)
+  */
+ 	/* hard_start_xmit synchronizer */
+-	spinlock_t		xmit_lock ____cacheline_aligned_in_smp;
++	spinlock_t		_xmit_lock ____cacheline_aligned_in_smp;
+ 	/* cpu id of processor entered to hard_start_xmit or -1,
+ 	   if nobody entered there.
+ 	 */
+@@ -527,6 +538,7 @@ struct packet_type {
+ 					 struct net_device *,
+ 					 struct packet_type *,
+ 					 struct net_device *);
++	struct sk_buff		*(*gso_segment)(struct sk_buff *skb, int sg);
+ 	void			*af_packet_priv;
+ 	struct list_head	list;
+ };
+@@ -693,7 +705,8 @@ extern int		dev_change_name(struct net_d
+ extern int		dev_set_mtu(struct net_device *, int);
+ extern int		dev_set_mac_address(struct net_device *,
+ 					    struct sockaddr *);
+-extern void		dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
++extern int		dev_hard_start_xmit(struct sk_buff *skb,
++					    struct net_device *dev);
+ 
+ extern void		dev_init(void);
+ 
+@@ -900,11 +913,43 @@ static inline void __netif_rx_complete(s
+ 	clear_bit(__LINK_STATE_RX_SCHED, &dev->state);
+ }
+ 
++static inline void netif_tx_lock(struct net_device *dev)
++{
++	spin_lock(&dev->_xmit_lock);
++	dev->xmit_lock_owner = smp_processor_id();
++}
++
++static inline void netif_tx_lock_bh(struct net_device *dev)
++{
++	spin_lock_bh(&dev->_xmit_lock);
++	dev->xmit_lock_owner = smp_processor_id();
++}
++
++static inline int netif_tx_trylock(struct net_device *dev)
++{
++	int err = spin_trylock(&dev->_xmit_lock);
++	if (!err)
++		dev->xmit_lock_owner = smp_processor_id();
++	return err;
++}
++
++static inline void netif_tx_unlock(struct net_device *dev)
++{
++	dev->xmit_lock_owner = -1;
++	spin_unlock(&dev->_xmit_lock);
++}
++
++static inline void netif_tx_unlock_bh(struct net_device *dev)
++{
++	dev->xmit_lock_owner = -1;
++	spin_unlock_bh(&dev->_xmit_lock);
++}
++
+ static inline void netif_tx_disable(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	netif_stop_queue(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ /* These functions live elsewhere (drivers/net/net_init.c, but related) */
+@@ -932,6 +977,7 @@ extern int		netdev_max_backlog;
+ extern int		weight_p;
+ extern int		netdev_set_master(struct net_device *dev, struct net_device
*master);
+ extern int skb_checksum_help(struct sk_buff *skb, int inward);
++extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg);
+ #ifdef CONFIG_BUG
+ extern void netdev_rx_csum_fault(struct net_device *dev);
+ #else
+@@ -951,6 +997,13 @@ #endif
+ 
+ extern void linkwatch_run_queue(void);
+ 
++static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb)
++{
++	int feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT;
++	return skb_shinfo(skb)->gso_size &&
++	       (dev->features & feature) != feature;
++}
++
+ #endif /* __KERNEL__ */
+ 
+ #endif	/* _LINUX_DEV_H */
+diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
+index b94d1ad..75b5b93 100644
+--- a/include/net/pkt_sched.h
++++ b/include/net/pkt_sched.h
+@@ -218,12 +218,13 @@ extern struct qdisc_rate_table *qdisc_ge
+ 		struct rtattr *tab);
+ extern void qdisc_put_rtab(struct qdisc_rate_table *tab);
+ 
+-extern int qdisc_restart(struct net_device *dev);
++extern void __qdisc_run(struct net_device *dev);
+ 
+ static inline void qdisc_run(struct net_device *dev)
+ {
+-	while (!netif_queue_stopped(dev) && qdisc_restart(dev) < 0)
+-		/* NOTHING */;
++	if (!netif_queue_stopped(dev) &&
++	    !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
++		__qdisc_run(dev);
+ }
+ 
+ extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp,
+diff --git a/include/net/protocol.h b/include/net/protocol.h
+index 6dc5970..650911d 100644
+--- a/include/net/protocol.h
++++ b/include/net/protocol.h
+@@ -37,6 +37,7 @@ #define MAX_INET_PROTOS	256		/* Must be 
+ struct net_protocol {
+ 	int			(*handler)(struct sk_buff *skb);
+ 	void			(*err_handler)(struct sk_buff *skb, u32 info);
++	struct sk_buff	       *(*gso_segment)(struct sk_buff *skb, int sg);
+ 	int			no_policy;
+ };
+ 
+diff --git a/include/net/sock.h b/include/net/sock.h
+index f63d0d5..a8e8d21 100644
+--- a/include/net/sock.h
++++ b/include/net/sock.h
+@@ -1064,9 +1064,13 @@ static inline void sk_setup_caps(struct 
+ {
+ 	__sk_dst_set(sk, dst);
+ 	sk->sk_route_caps = dst->dev->features;
++	if (sk->sk_route_caps & NETIF_F_GSO)
++		sk->sk_route_caps |= NETIF_F_TSO;
+ 	if (sk->sk_route_caps & NETIF_F_TSO) {
+ 		if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
+ 			sk->sk_route_caps &= ~NETIF_F_TSO;
++		else 
++			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
+ 	}
+ }
+ 
+diff --git a/include/net/tcp.h b/include/net/tcp.h
+index 77f21c6..40e337c 100644
+--- a/include/net/tcp.h
++++ b/include/net/tcp.h
+@@ -552,13 +552,13 @@ #include <net/tcp_ecn.h>
+  */
+ static inline int tcp_skb_pcount(const struct sk_buff *skb)
+ {
+-	return skb_shinfo(skb)->tso_segs;
++	return skb_shinfo(skb)->gso_segs;
+ }
+ 
+ /* This is valid iff tcp_skb_pcount() > 1. */
+ static inline int tcp_skb_mss(const struct sk_buff *skb)
+ {
+-	return skb_shinfo(skb)->tso_size;
++	return skb_shinfo(skb)->gso_size;
+ }
+ 
+ static inline void tcp_dec_pcount_approx(__u32 *count,
+@@ -1063,6 +1063,8 @@ extern struct request_sock_ops tcp_reque
+ 
+ extern int tcp_v4_destroy_sock(struct sock *sk);
+ 
++extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg);
++
+ #ifdef CONFIG_PROC_FS
+ extern int  tcp4_proc_init(void);
+ extern void tcp4_proc_exit(void);
+diff --git a/net/atm/clip.c b/net/atm/clip.c
+index 1842a4e..6dc21a7 100644
+--- a/net/atm/clip.c
++++ b/net/atm/clip.c
+@@ -101,7 +101,7 @@ static void unlink_clip_vcc(struct clip_
+ 		printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc);
+ 		return;
+ 	}
+-	spin_lock_bh(&entry->neigh->dev->xmit_lock);	/* block
clip_start_xmit() */
++	netif_tx_lock_bh(entry->neigh->dev);	/* block clip_start_xmit() */
+ 	entry->neigh->used = jiffies;
+ 	for (walk = &entry->vccs; *walk; walk = &(*walk)->next)
+ 		if (*walk == clip_vcc) {
+@@ -125,7 +125,7 @@ static void unlink_clip_vcc(struct clip_
+ 	printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc "
+ 	  "0x%p)\n",entry,clip_vcc);
+ out:
+-	spin_unlock_bh(&entry->neigh->dev->xmit_lock);
++	netif_tx_unlock_bh(entry->neigh->dev);
+ }
+ 
+ /* The neighbour entry n->lock is held. */
+diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
+index 0b33a7b..b9e32b6 100644
+--- a/net/bridge/br_device.c
++++ b/net/bridge/br_device.c
+@@ -146,9 +146,9 @@ static int br_set_tx_csum(struct net_dev
+ 	struct net_bridge *br = netdev_priv(dev);
+ 
+ 	if (data)
+-		br->feature_mask |= NETIF_F_IP_CSUM;
++		br->feature_mask |= NETIF_F_NO_CSUM;
+ 	else
+-		br->feature_mask &= ~NETIF_F_IP_CSUM;
++		br->feature_mask &= ~NETIF_F_ALL_CSUM;
+ 
+ 	br_features_recompute(br);
+ 	return 0;
+@@ -186,5 +186,5 @@ void br_dev_setup(struct net_device *dev
+ 	dev->priv_flags = IFF_EBRIDGE;
+ 
+  	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST
+- 		| NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_IP_CSUM;
++ 		| NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_NO_CSUM;
+ }
+diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
+index 2d24fb4..00b1128 100644
+--- a/net/bridge/br_forward.c
++++ b/net/bridge/br_forward.c
+@@ -32,7 +32,7 @@ static inline int should_deliver(const s
+ int br_dev_queue_push_xmit(struct sk_buff *skb)
+ {
+ 	/* drop mtu oversized packets except tso */
+-	if (skb->len > skb->dev->mtu &&
!skb_shinfo(skb)->tso_size)
++	if (skb->len > skb->dev->mtu &&
!skb_shinfo(skb)->gso_size)
+ 		kfree_skb(skb);
+ 	else {
+ #ifdef CONFIG_BRIDGE_NETFILTER
+diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
+index f36b35e..4e9743d 100644
+--- a/net/bridge/br_if.c
++++ b/net/bridge/br_if.c
+@@ -385,14 +385,24 @@ void br_features_recompute(struct net_br
+ 	struct net_bridge_port *p;
+ 	unsigned long features, checksum;
+ 
+-	features = br->feature_mask &~ NETIF_F_IP_CSUM;
+-	checksum = br->feature_mask & NETIF_F_IP_CSUM;
++	checksum = br->feature_mask & NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0;
++	features = br->feature_mask & ~NETIF_F_ALL_CSUM;
+ 
+ 	list_for_each_entry(p, &br->port_list, list) {
+-		if (!(p->dev->features 
+-		      & (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)))
++		unsigned long feature = p->dev->features;
++
++		if (checksum & NETIF_F_NO_CSUM && !(feature &
NETIF_F_NO_CSUM))
++			checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM;
++		if (checksum & NETIF_F_HW_CSUM && !(feature &
NETIF_F_HW_CSUM))
++			checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM;
++		if (!(feature & NETIF_F_IP_CSUM))
+ 			checksum = 0;
+-		features &= p->dev->features;
++
++		if (feature & NETIF_F_GSO)
++			feature |= NETIF_F_TSO;
++		feature |= NETIF_F_GSO;
++
++		features &= feature;
+ 	}
+ 
+ 	br->dev->features = features | checksum | NETIF_F_LLTX;
+diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
+index 9e27373..588207f 100644
+--- a/net/bridge/br_netfilter.c
++++ b/net/bridge/br_netfilter.c
+@@ -743,7 +743,7 @@ static int br_nf_dev_queue_xmit(struct s
+ {
+ 	if (skb->protocol == htons(ETH_P_IP) &&
+ 	    skb->len > skb->dev->mtu &&
+-	    !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
++	    !skb_shinfo(skb)->gso_size)
+ 		return ip_fragment(skb, br_dev_queue_push_xmit);
+ 	else
+ 		return br_dev_queue_push_xmit(skb);
+diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
+index 05d6085..c57d887 100644
+--- a/net/core/dev_mcast.c
++++ b/net/core/dev_mcast.c
+@@ -62,7 +62,7 @@ #include <net/arp.h>
+  *	Device mc lists are changed by bh at least if IPv6 is enabled,
+  *	so that it must be bh protected.
+  *
+- *	We block accesses to device mc filters with dev->xmit_lock.
++ *	We block accesses to device mc filters with netif_tx_lock.
+  */
+ 
+ /*
+@@ -93,9 +93,9 @@ static void __dev_mc_upload(struct net_d
+ 
+ void dev_mc_upload(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	__dev_mc_upload(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ /*
+@@ -107,7 +107,7 @@ int dev_mc_delete(struct net_device *dev
+ 	int err = 0;
+ 	struct dev_mc_list *dmi, **dmip;
+ 
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 
+ 	for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip =
&dmi->next) {
+ 		/*
+@@ -139,13 +139,13 @@ int dev_mc_delete(struct net_device *dev
+ 			 */
+ 			__dev_mc_upload(dev);
+ 			
+-			spin_unlock_bh(&dev->xmit_lock);
++			netif_tx_unlock_bh(dev);
+ 			return 0;
+ 		}
+ 	}
+ 	err = -ENOENT;
+ done:
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	return err;
+ }
+ 
+@@ -160,7 +160,7 @@ int dev_mc_add(struct net_device *dev, v
+ 
+ 	dmi1 = kmalloc(sizeof(*dmi), GFP_ATOMIC);
+ 
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) {
+ 		if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
+ 		    dmi->dmi_addrlen == alen) {
+@@ -176,7 +176,7 @@ int dev_mc_add(struct net_device *dev, v
+ 	}
+ 
+ 	if ((dmi = dmi1) == NULL) {
+-		spin_unlock_bh(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 		return -ENOMEM;
+ 	}
+ 	memcpy(dmi->dmi_addr, addr, alen);
+@@ -189,11 +189,11 @@ int dev_mc_add(struct net_device *dev, v
+ 
+ 	__dev_mc_upload(dev);
+ 	
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	return 0;
+ 
+ done:
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	kfree(dmi1);
+ 	return err;
+ }
+@@ -204,7 +204,7 @@ done:
+ 
+ void dev_mc_discard(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	
+ 	while (dev->mc_list != NULL) {
+ 		struct dev_mc_list *tmp = dev->mc_list;
+@@ -215,7 +215,7 @@ void dev_mc_discard(struct net_device *d
+ 	}
+ 	dev->mc_count = 0;
+ 
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ #ifdef CONFIG_PROC_FS
+@@ -250,7 +250,7 @@ static int dev_mc_seq_show(struct seq_fi
+ 	struct dev_mc_list *m;
+ 	struct net_device *dev = v;
+ 
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	for (m = dev->mc_list; m; m = m->next) {
+ 		int i;
+ 
+@@ -262,7 +262,7 @@ static int dev_mc_seq_show(struct seq_fi
+ 
+ 		seq_putc(seq, ''\n'');
+ 	}
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	return 0;
+ }
+ 
+diff --git a/net/core/ethtool.c b/net/core/ethtool.c
+index e6f7610..27ce168 100644
+--- a/net/core/ethtool.c
++++ b/net/core/ethtool.c
+@@ -30,7 +30,7 @@ u32 ethtool_op_get_link(struct net_devic
+ 
+ u32 ethtool_op_get_tx_csum(struct net_device *dev)
+ {
+-	return (dev->features & (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM)) != 0;
++	return (dev->features & NETIF_F_ALL_CSUM) != 0;
+ }
+ 
+ int ethtool_op_set_tx_csum(struct net_device *dev, u32 data)
+@@ -551,9 +551,7 @@ static int ethtool_set_sg(struct net_dev
+ 		return -EFAULT;
+ 
+ 	if (edata.data && 
+-	    !(dev->features & (NETIF_F_IP_CSUM |
+-			       NETIF_F_NO_CSUM |
+-			       NETIF_F_HW_CSUM)))
++	    !(dev->features & NETIF_F_ALL_CSUM))
+ 		return -EINVAL;
+ 
+ 	return __ethtool_set_sg(dev, edata.data);
+@@ -591,7 +589,7 @@ static int ethtool_set_tso(struct net_de
+ 
+ static int ethtool_get_ufo(struct net_device *dev, char __user *useraddr)
+ {
+-	struct ethtool_value edata = { ETHTOOL_GTSO };
++	struct ethtool_value edata = { ETHTOOL_GUFO };
+ 
+ 	if (!dev->ethtool_ops->get_ufo)
+ 		return -EOPNOTSUPP;
+@@ -600,6 +598,7 @@ static int ethtool_get_ufo(struct net_de
+ 		 return -EFAULT;
+ 	return 0;
+ }
++
+ static int ethtool_set_ufo(struct net_device *dev, char __user *useraddr)
+ {
+ 	struct ethtool_value edata;
+@@ -615,6 +614,29 @@ static int ethtool_set_ufo(struct net_de
+ 	return dev->ethtool_ops->set_ufo(dev, edata.data);
+ }
+ 
++static int ethtool_get_gso(struct net_device *dev, char __user *useraddr)
++{
++	struct ethtool_value edata = { ETHTOOL_GGSO };
++
++	edata.data = dev->features & NETIF_F_GSO;
++	if (copy_to_user(useraddr, &edata, sizeof(edata)))
++		 return -EFAULT;
++	return 0;
++}
++
++static int ethtool_set_gso(struct net_device *dev, char __user *useraddr)
++{
++	struct ethtool_value edata;
++
++	if (copy_from_user(&edata, useraddr, sizeof(edata)))
++		return -EFAULT;
++	if (edata.data)
++		dev->features |= NETIF_F_GSO;
++	else
++		dev->features &= ~NETIF_F_GSO;
++	return 0;
++}
++
+ static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
+ {
+ 	struct ethtool_test test;
+@@ -906,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr)
+ 	case ETHTOOL_SUFO:
+ 		rc = ethtool_set_ufo(dev, useraddr);
+ 		break;
++	case ETHTOOL_GGSO:
++		rc = ethtool_get_gso(dev, useraddr);
++		break;
++	case ETHTOOL_SGSO:
++		rc = ethtool_set_gso(dev, useraddr);
++		break;
+ 	default:
+ 		rc =  -EOPNOTSUPP;
+ 	}
+diff --git a/net/core/netpoll.c b/net/core/netpoll.c
+index ea51f8d..ec28d3b 100644
+--- a/net/core/netpoll.c
++++ b/net/core/netpoll.c
+@@ -273,24 +273,21 @@ static void netpoll_send_skb(struct netp
+ 
+ 	do {
+ 		npinfo->tries--;
+-		spin_lock(&np->dev->xmit_lock);
+-		np->dev->xmit_lock_owner = smp_processor_id();
++		netif_tx_lock(np->dev);
+ 
+ 		/*
+ 		 * network drivers do not expect to be called if the queue is
+ 		 * stopped.
+ 		 */
+ 		if (netif_queue_stopped(np->dev)) {
+-			np->dev->xmit_lock_owner = -1;
+-			spin_unlock(&np->dev->xmit_lock);
++			netif_tx_unlock(np->dev);
+ 			netpoll_poll(np);
+ 			udelay(50);
+ 			continue;
+ 		}
+ 
+ 		status = np->dev->hard_start_xmit(skb, np->dev);
+-		np->dev->xmit_lock_owner = -1;
+-		spin_unlock(&np->dev->xmit_lock);
++		netif_tx_unlock(np->dev);
+ 
+ 		/* success */
+ 		if(!status) {
+diff --git a/net/core/pktgen.c b/net/core/pktgen.c
+index da16f8f..2380347 100644
+--- a/net/core/pktgen.c
++++ b/net/core/pktgen.c
+@@ -2582,7 +2582,7 @@ static __inline__ void pktgen_xmit(struc
+ 		}
+ 	}
+ 	
+-	spin_lock_bh(&odev->xmit_lock);
++	netif_tx_lock_bh(odev);
+ 	if (!netif_queue_stopped(odev)) {
+ 
+ 		atomic_inc(&(pkt_dev->skb->users));
+@@ -2627,7 +2627,7 @@ retry_now:
+ 		pkt_dev->next_tx_ns = 0;
+         }
+ 
+-	spin_unlock_bh(&odev->xmit_lock);
++	netif_tx_unlock_bh(odev);
+ 	
+ 	/* If pkt_dev->count is zero, then run forever */
+ 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >=
pkt_dev->count)) {
+diff --git a/net/decnet/dn_nsp_in.c b/net/decnet/dn_nsp_in.c
+index 44bda85..2e3323a 100644
+--- a/net/decnet/dn_nsp_in.c
++++ b/net/decnet/dn_nsp_in.c
+@@ -801,8 +801,7 @@ got_it:
+ 		 * We linearize everything except data segments here.
+ 		 */
+ 		if (cb->nsp_flags & ~0x60) {
+-			if (unlikely(skb_is_nonlinear(skb)) &&
+-			    skb_linearize(skb, GFP_ATOMIC) != 0)
++			if (unlikely(skb_linearize(skb)))
+ 				goto free_out;
+ 		}
+ 
+diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
+index 3407f19..a0a25e0 100644
+--- a/net/decnet/dn_route.c
++++ b/net/decnet/dn_route.c
+@@ -629,8 +629,7 @@ int dn_route_rcv(struct sk_buff *skb, st
+ 			padlen);
+ 
+         if (flags & DN_RT_PKT_CNTL) {
+-		if (unlikely(skb_is_nonlinear(skb)) &&
+-		    skb_linearize(skb, GFP_ATOMIC) != 0)
++		if (unlikely(skb_linearize(skb)))
+ 			goto dump_it;
+ 
+                 switch(flags & DN_RT_CNTL_MSK) {
+diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
+index 97c276f..3692db5 100644
+--- a/net/ipv4/af_inet.c
++++ b/net/ipv4/af_inet.c
+@@ -68,6 +68,7 @@
+  */
+ 
+ #include <linux/config.h>
++#include <linux/err.h>
+ #include <linux/errno.h>
+ #include <linux/types.h>
+ #include <linux/socket.h>
+@@ -1084,6 +1085,54 @@ int inet_sk_rebuild_header(struct sock *
+ 
+ EXPORT_SYMBOL(inet_sk_rebuild_header);
+ 
++static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int sg)
++{
++	struct sk_buff *segs = ERR_PTR(-EINVAL);
++	struct iphdr *iph;
++	struct net_protocol *ops;
++	int proto;
++	int ihl;
++	int id;
++
++	if (!pskb_may_pull(skb, sizeof(*iph)))
++		goto out;
++
++	iph = skb->nh.iph;
++	ihl = iph->ihl * 4;
++	if (ihl < sizeof(*iph))
++		goto out;
++
++	if (!pskb_may_pull(skb, ihl))
++		goto out;
++
++	skb->h.raw = __skb_pull(skb, ihl);
++	iph = skb->nh.iph;
++	id = ntohs(iph->id);
++	proto = iph->protocol & (MAX_INET_PROTOS - 1);
++	segs = ERR_PTR(-EPROTONOSUPPORT);
++
++	rcu_read_lock();
++	ops = rcu_dereference(inet_protos[proto]);
++	if (ops && ops->gso_segment)
++		segs = ops->gso_segment(skb, sg);
++	rcu_read_unlock();
++
++	if (IS_ERR(segs))
++		goto out;
++
++	skb = segs;
++	do {
++		iph = skb->nh.iph;
++		iph->id = htons(id++);
++		iph->tot_len = htons(skb->len - skb->mac_len);
++		iph->check = 0;
++		iph->check = ip_fast_csum(skb->nh.raw, iph->ihl);
++	} while ((skb = skb->next));
++
++out:
++	return segs;
++}
++
+ #ifdef CONFIG_IP_MULTICAST
+ static struct net_protocol igmp_protocol = {
+ 	.handler =	igmp_rcv,
+@@ -1093,6 +1142,7 @@ #endif
+ static struct net_protocol tcp_protocol = {
+ 	.handler =	tcp_v4_rcv,
+ 	.err_handler =	tcp_v4_err,
++	.gso_segment =	tcp_tso_segment,
+ 	.no_policy =	1,
+ };
+ 
+@@ -1138,6 +1188,7 @@ static int ipv4_proc_init(void);
+ static struct packet_type ip_packet_type = {
+ 	.type = __constant_htons(ETH_P_IP),
+ 	.func = ip_rcv,
++	.gso_segment = inet_gso_segment,
+ };
+ 
+ static int __init inet_init(void)
+diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
+index 8dcba38..19c3c73 100644
+--- a/net/ipv4/ip_output.c
++++ b/net/ipv4/ip_output.c
+@@ -210,8 +210,7 @@ #if defined(CONFIG_NETFILTER) && defined
+ 		return dst_output(skb);
+ 	}
+ #endif
+-	if (skb->len > dst_mtu(skb->dst) &&
+-	    !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
++	if (skb->len > dst_mtu(skb->dst) &&
!skb_shinfo(skb)->gso_size)
+ 		return ip_fragment(skb, ip_finish_output2);
+ 	else
+ 		return ip_finish_output2(skb);
+@@ -362,7 +361,7 @@ packet_routed:
+ 	}
+ 
+ 	ip_select_ident_more(iph, &rt->u.dst, sk,
+-			     (skb_shinfo(skb)->tso_segs ?: 1) - 1);
++			     (skb_shinfo(skb)->gso_segs ?: 1) - 1);
+ 
+ 	/* Add an IP checksum. */
+ 	ip_send_check(iph);
+@@ -743,7 +742,8 @@ static inline int ip_ufo_append_data(str
+ 			       (length - transhdrlen));
+ 	if (!err) {
+ 		/* specify the length of each IP datagram fragment*/
+-		skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
++		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
++		skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
+ 		__skb_queue_tail(&sk->sk_write_queue, skb);
+ 
+ 		return 0;
+@@ -839,7 +839,7 @@ int ip_append_data(struct sock *sk,
+ 	 */
+ 	if (transhdrlen &&
+ 	    length + fragheaderlen <= mtu &&
+-	   
rt->u.dst.dev->features&(NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)
&&
++	    rt->u.dst.dev->features & NETIF_F_ALL_CSUM &&
+ 	    !exthdrlen)
+ 		csummode = CHECKSUM_HW;
+ 
+@@ -1086,14 +1086,16 @@ ssize_t	ip_append_page(struct sock *sk, 
+ 
+ 	inet->cork.length += size;
+ 	if ((sk->sk_protocol == IPPROTO_UDP) &&
+-	    (rt->u.dst.dev->features & NETIF_F_UFO))
+-		skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
++	    (rt->u.dst.dev->features & NETIF_F_UFO)) {
++		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
++		skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
++	}
+ 
+ 
+ 	while (size > 0) {
+ 		int i;
+ 
+-		if (skb_shinfo(skb)->ufo_size)
++		if (skb_shinfo(skb)->gso_size)
+ 			len = size;
+ 		else {
+ 
+diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
+index d64e2ec..7494823 100644
+--- a/net/ipv4/ipcomp.c
++++ b/net/ipv4/ipcomp.c
+@@ -84,7 +84,7 @@ static int ipcomp_input(struct xfrm_stat
+                         struct xfrm_decap_state *decap, struct sk_buff *skb)
+ {
+ 	u8 nexthdr;
+-	int err = 0;
++	int err = -ENOMEM;
+ 	struct iphdr *iph;
+ 	union {
+ 		struct iphdr	iph;
+@@ -92,11 +92,8 @@ static int ipcomp_input(struct xfrm_stat
+ 	} tmp_iph;
+ 
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-	    skb_linearize(skb, GFP_ATOMIC) != 0) {
+-	    	err = -ENOMEM;
++	if (skb_linearize_cow(skb))
+ 	    	goto out;
+-	}
+ 
+ 	skb->ip_summed = CHECKSUM_NONE;
+ 
+@@ -171,10 +168,8 @@ static int ipcomp_output(struct xfrm_sta
+ 		goto out_ok;
+ 	}
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-	    skb_linearize(skb, GFP_ATOMIC) != 0) {
++	if (skb_linearize_cow(skb))
+ 		goto out_ok;
+-	}
+ 	
+ 	err = ipcomp_compress(x, skb);
+ 	iph = skb->nh.iph;
+diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
+index 00aa80e..de0753f 100644
+--- a/net/ipv4/tcp.c
++++ b/net/ipv4/tcp.c
+@@ -257,6 +257,7 @@ #include <linux/smp_lock.h>
+ #include <linux/fs.h>
+ #include <linux/random.h>
+ #include <linux/bootmem.h>
++#include <linux/err.h>
+ 
+ #include <net/icmp.h>
+ #include <net/tcp.h>
+@@ -570,7 +571,7 @@ new_segment:
+ 		skb->ip_summed = CHECKSUM_HW;
+ 		tp->write_seq += copy;
+ 		TCP_SKB_CB(skb)->end_seq += copy;
+-		skb_shinfo(skb)->tso_segs = 0;
++		skb_shinfo(skb)->gso_segs = 0;
+ 
+ 		if (!copied)
+ 			TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH;
+@@ -621,14 +622,10 @@ ssize_t tcp_sendpage(struct socket *sock
+ 	ssize_t res;
+ 	struct sock *sk = sock->sk;
+ 
+-#define TCP_ZC_CSUM_FLAGS (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM |
NETIF_F_HW_CSUM)
+-
+ 	if (!(sk->sk_route_caps & NETIF_F_SG) ||
+-	    !(sk->sk_route_caps & TCP_ZC_CSUM_FLAGS))
++	    !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
+ 		return sock_no_sendpage(sock, page, offset, size, flags);
+ 
+-#undef TCP_ZC_CSUM_FLAGS
+-
+ 	lock_sock(sk);
+ 	TCP_CHECK_TIMER(sk);
+ 	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+@@ -725,9 +722,7 @@ new_segment:
+ 				/*
+ 				 * Check whether we can use HW checksum.
+ 				 */
+-				if (sk->sk_route_caps &
+-				    (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM |
+-				     NETIF_F_HW_CSUM))
++				if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
+ 					skb->ip_summed = CHECKSUM_HW;
+ 
+ 				skb_entail(sk, tp, skb);
+@@ -823,7 +818,7 @@ new_segment:
+ 
+ 			tp->write_seq += copy;
+ 			TCP_SKB_CB(skb)->end_seq += copy;
+-			skb_shinfo(skb)->tso_segs = 0;
++			skb_shinfo(skb)->gso_segs = 0;
+ 
+ 			from += copy;
+ 			copied += copy;
+@@ -2026,6 +2021,67 @@ int tcp_getsockopt(struct sock *sk, int 
+ }
+ 
+ 
++struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg)
++{
++	struct sk_buff *segs = ERR_PTR(-EINVAL);
++	struct tcphdr *th;
++	unsigned thlen;
++	unsigned int seq;
++	unsigned int delta;
++	unsigned int oldlen;
++	unsigned int len;
++
++	if (!pskb_may_pull(skb, sizeof(*th)))
++		goto out;
++
++	th = skb->h.th;
++	thlen = th->doff * 4;
++	if (thlen < sizeof(*th))
++		goto out;
++
++	if (!pskb_may_pull(skb, thlen))
++		goto out;
++
++	oldlen = (u16)~skb->len;
++	__skb_pull(skb, thlen);
++
++	segs = skb_segment(skb, sg);
++	if (IS_ERR(segs))
++		goto out;
++
++	len = skb_shinfo(skb)->gso_size;
++	delta = htonl(oldlen + (thlen + len));
++
++	skb = segs;
++	th = skb->h.th;
++	seq = ntohl(th->seq);
++
++	do {
++		th->fin = th->psh = 0;
++
++		th->check = ~csum_fold(th->check + delta);
++		if (skb->ip_summed != CHECKSUM_HW)
++			th->check = csum_fold(csum_partial(skb->h.raw, thlen,
++							   skb->csum));
++
++		seq += len;
++		skb = skb->next;
++		th = skb->h.th;
++
++		th->seq = htonl(seq);
++		th->cwr = 0;
++	} while (skb->next);
++
++	delta = htonl(oldlen + (skb->tail - skb->h.raw) + skb->data_len);
++	th->check = ~csum_fold(th->check + delta);
++	if (skb->ip_summed != CHECKSUM_HW)
++		th->check = csum_fold(csum_partial(skb->h.raw, thlen,
++						   skb->csum));
++
++out:
++	return segs;
++}
++
+ extern void __skb_cb_too_small_for_tcp(int, int);
+ extern struct tcp_congestion_ops tcp_reno;
+ 
+diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
+index e9a54ae..defe77a 100644
+--- a/net/ipv4/tcp_input.c
++++ b/net/ipv4/tcp_input.c
+@@ -1072,7 +1072,7 @@ tcp_sacktag_write_queue(struct sock *sk,
+ 				else
+ 					pkt_len = (end_seq -
+ 						   TCP_SKB_CB(skb)->seq);
+-				if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->tso_size))
++				if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->gso_size))
+ 					break;
+ 				pcount = tcp_skb_pcount(skb);
+ 			}
+diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
+index 310f2e6..ee01f69 100644
+--- a/net/ipv4/tcp_output.c
++++ b/net/ipv4/tcp_output.c
+@@ -497,15 +497,17 @@ static void tcp_set_skb_tso_segs(struct 
+ 		/* Avoid the costly divide in the normal
+ 		 * non-TSO case.
+ 		 */
+-		skb_shinfo(skb)->tso_segs = 1;
+-		skb_shinfo(skb)->tso_size = 0;
++		skb_shinfo(skb)->gso_segs = 1;
++		skb_shinfo(skb)->gso_size = 0;
++		skb_shinfo(skb)->gso_type = 0;
+ 	} else {
+ 		unsigned int factor;
+ 
+ 		factor = skb->len + (mss_now - 1);
+ 		factor /= mss_now;
+-		skb_shinfo(skb)->tso_segs = factor;
+-		skb_shinfo(skb)->tso_size = mss_now;
++		skb_shinfo(skb)->gso_segs = factor;
++		skb_shinfo(skb)->gso_size = mss_now;
++		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+ 	}
+ }
+ 
+@@ -850,7 +852,7 @@ static int tcp_init_tso_segs(struct sock
+ 
+ 	if (!tso_segs ||
+ 	    (tso_segs > 1 &&
+-	     skb_shinfo(skb)->tso_size != mss_now)) {
++	     tcp_skb_mss(skb) != mss_now)) {
+ 		tcp_set_skb_tso_segs(sk, skb, mss_now);
+ 		tso_segs = tcp_skb_pcount(skb);
+ 	}
+@@ -1510,8 +1512,9 @@ int tcp_retransmit_skb(struct sock *sk, 
+ 	   tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
+ 		if (!pskb_trim(skb, 0)) {
+ 			TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1;
+-			skb_shinfo(skb)->tso_segs = 1;
+-			skb_shinfo(skb)->tso_size = 0;
++			skb_shinfo(skb)->gso_segs = 1;
++			skb_shinfo(skb)->gso_size = 0;
++			skb_shinfo(skb)->gso_type = 0;
+ 			skb->ip_summed = CHECKSUM_NONE;
+ 			skb->csum = 0;
+ 		}
+@@ -1716,8 +1719,9 @@ void tcp_send_fin(struct sock *sk)
+ 		skb->csum = 0;
+ 		TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_FIN);
+ 		TCP_SKB_CB(skb)->sacked = 0;
+-		skb_shinfo(skb)->tso_segs = 1;
+-		skb_shinfo(skb)->tso_size = 0;
++		skb_shinfo(skb)->gso_segs = 1;
++		skb_shinfo(skb)->gso_size = 0;
++		skb_shinfo(skb)->gso_type = 0;
+ 
+ 		/* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */
+ 		TCP_SKB_CB(skb)->seq = tp->write_seq;
+@@ -1749,8 +1753,9 @@ void tcp_send_active_reset(struct sock *
+ 	skb->csum = 0;
+ 	TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_RST);
+ 	TCP_SKB_CB(skb)->sacked = 0;
+-	skb_shinfo(skb)->tso_segs = 1;
+-	skb_shinfo(skb)->tso_size = 0;
++	skb_shinfo(skb)->gso_segs = 1;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 
+ 	/* Send it off. */
+ 	TCP_SKB_CB(skb)->seq = tcp_acceptable_seq(sk, tp);
+@@ -1833,8 +1838,9 @@ struct sk_buff * tcp_make_synack(struct 
+ 	TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
+ 	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;
+ 	TCP_SKB_CB(skb)->sacked = 0;
+-	skb_shinfo(skb)->tso_segs = 1;
+-	skb_shinfo(skb)->tso_size = 0;
++	skb_shinfo(skb)->gso_segs = 1;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 	th->seq = htonl(TCP_SKB_CB(skb)->seq);
+ 	th->ack_seq = htonl(tcp_rsk(req)->rcv_isn + 1);
+ 	if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */
+@@ -1937,8 +1943,9 @@ int tcp_connect(struct sock *sk)
+ 	TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN;
+ 	TCP_ECN_send_syn(sk, tp, buff);
+ 	TCP_SKB_CB(buff)->sacked = 0;
+-	skb_shinfo(buff)->tso_segs = 1;
+-	skb_shinfo(buff)->tso_size = 0;
++	skb_shinfo(buff)->gso_segs = 1;
++	skb_shinfo(buff)->gso_size = 0;
++	skb_shinfo(buff)->gso_type = 0;
+ 	buff->csum = 0;
+ 	TCP_SKB_CB(buff)->seq = tp->write_seq++;
+ 	TCP_SKB_CB(buff)->end_seq = tp->write_seq;
+@@ -2042,8 +2049,9 @@ void tcp_send_ack(struct sock *sk)
+ 		buff->csum = 0;
+ 		TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK;
+ 		TCP_SKB_CB(buff)->sacked = 0;
+-		skb_shinfo(buff)->tso_segs = 1;
+-		skb_shinfo(buff)->tso_size = 0;
++		skb_shinfo(buff)->gso_segs = 1;
++		skb_shinfo(buff)->gso_size = 0;
++		skb_shinfo(buff)->gso_type = 0;
+ 
+ 		/* Send it off, this clears delayed acks for us. */
+ 		TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq =
tcp_acceptable_seq(sk, tp);
+@@ -2078,8 +2086,9 @@ static int tcp_xmit_probe_skb(struct soc
+ 	skb->csum = 0;
+ 	TCP_SKB_CB(skb)->flags = TCPCB_FLAG_ACK;
+ 	TCP_SKB_CB(skb)->sacked = urgent;
+-	skb_shinfo(skb)->tso_segs = 1;
+-	skb_shinfo(skb)->tso_size = 0;
++	skb_shinfo(skb)->gso_segs = 1;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 
+ 	/* Use a previous sequence.  This should cause the other
+ 	 * end to send an ack.  Don''t queue or clone SKB, just
+diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
+index 32ad229..737c1db 100644
+--- a/net/ipv4/xfrm4_output.c
++++ b/net/ipv4/xfrm4_output.c
+@@ -9,6 +9,8 @@
+  */
+ 
+ #include <linux/compiler.h>
++#include <linux/if_ether.h>
++#include <linux/kernel.h>
+ #include <linux/skbuff.h>
+ #include <linux/spinlock.h>
+ #include <linux/netfilter_ipv4.h>
+@@ -152,16 +154,10 @@ error_nolock:
+ 	goto out_exit;
+ }
+ 
+-static int xfrm4_output_finish(struct sk_buff *skb)
++static int xfrm4_output_finish2(struct sk_buff *skb)
+ {
+ 	int err;
+ 
+-#ifdef CONFIG_NETFILTER
+-	if (!skb->dst->xfrm) {
+-		IPCB(skb)->flags |= IPSKB_REROUTED;
+-		return dst_output(skb);
+-	}
+-#endif
+ 	while (likely((err = xfrm4_output_one(skb)) == 0)) {
+ 		nf_reset(skb);
+ 
+@@ -174,7 +170,7 @@ #endif
+ 			return dst_output(skb);
+ 
+ 		err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL,
+-			      skb->dst->dev, xfrm4_output_finish);
++			      skb->dst->dev, xfrm4_output_finish2);
+ 		if (unlikely(err != 1))
+ 			break;
+ 	}
+@@ -182,6 +178,48 @@ #endif
+ 	return err;
+ }
+ 
++static int xfrm4_output_finish(struct sk_buff *skb)
++{
++	struct sk_buff *segs;
++
++#ifdef CONFIG_NETFILTER
++	if (!skb->dst->xfrm) {
++		IPCB(skb)->flags |= IPSKB_REROUTED;
++		return dst_output(skb);
++	}
++#endif
++
++	if (!skb_shinfo(skb)->gso_size)
++		return xfrm4_output_finish2(skb);
++
++	skb->protocol = htons(ETH_P_IP);
++	segs = skb_gso_segment(skb, 0);
++	kfree_skb(skb);
++	if (unlikely(IS_ERR(segs)))
++		return PTR_ERR(segs);
++
++	do {
++		struct sk_buff *nskb = segs->next;
++		int err;
++
++		segs->next = NULL;
++		err = xfrm4_output_finish2(segs);
++
++		if (unlikely(err)) {
++			while ((segs = nskb)) {
++				nskb = segs->next;
++				segs->next = NULL;
++				kfree_skb(segs);
++			}
++			return err;
++		}
++
++		segs = nskb;
++	} while (segs);
++
++	return 0;
++}
++
+ int xfrm4_output(struct sk_buff *skb)
+ {
+ 	return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
skb->dst->dev,
+diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
+index 5bf70b1..cf5d17e 100644
+--- a/net/ipv6/ip6_output.c
++++ b/net/ipv6/ip6_output.c
+@@ -147,7 +147,7 @@ static int ip6_output2(struct sk_buff *s
+ 
+ int ip6_output(struct sk_buff *skb)
+ {
+-	if ((skb->len > dst_mtu(skb->dst) &&
!skb_shinfo(skb)->ufo_size) ||
++	if ((skb->len > dst_mtu(skb->dst) &&
!skb_shinfo(skb)->gso_size) ||
+ 				dst_allfrag(skb->dst))
+ 		return ip6_fragment(skb, ip6_output2);
+ 	else
+@@ -829,8 +829,9 @@ static inline int ip6_ufo_append_data(st
+ 		struct frag_hdr fhdr;
+ 
+ 		/* specify the length of each IP datagram fragment*/
+-		skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen) - 
+-						sizeof(struct frag_hdr);
++		skb_shinfo(skb)->gso_size = mtu - fragheaderlen - 
++					    sizeof(struct frag_hdr);
++		skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
+ 		ipv6_select_ident(skb, &fhdr);
+ 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
+ 		__skb_queue_tail(&sk->sk_write_queue, skb);
+diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
+index d511a88..ef56d5d 100644
+--- a/net/ipv6/ipcomp6.c
++++ b/net/ipv6/ipcomp6.c
+@@ -64,7 +64,7 @@ static LIST_HEAD(ipcomp6_tfms_list);
+ 
+ static int ipcomp6_input(struct xfrm_state *x, struct xfrm_decap_state *decap,
struct sk_buff *skb)
+ {
+-	int err = 0;
++	int err = -ENOMEM;
+ 	u8 nexthdr = 0;
+ 	int hdr_len = skb->h.raw - skb->nh.raw;
+ 	unsigned char *tmp_hdr = NULL;
+@@ -75,11 +75,8 @@ static int ipcomp6_input(struct xfrm_sta
+ 	struct crypto_tfm *tfm;
+ 	int cpu;
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-		skb_linearize(skb, GFP_ATOMIC) != 0) {
+-		err = -ENOMEM;
++	if (skb_linearize_cow(skb))
+ 		goto out;
+-	}
+ 
+ 	skb->ip_summed = CHECKSUM_NONE;
+ 
+@@ -158,10 +155,8 @@ static int ipcomp6_output(struct xfrm_st
+ 		goto out_ok;
+ 	}
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-		skb_linearize(skb, GFP_ATOMIC) != 0) {
++	if (skb_linearize_cow(skb))
+ 		goto out_ok;
+-	}
+ 
+ 	/* compression */
+ 	plen = skb->len - hdr_len;
+diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
+index 8024217..39bdeec 100644
+--- a/net/ipv6/xfrm6_output.c
++++ b/net/ipv6/xfrm6_output.c
+@@ -151,7 +151,7 @@ error_nolock:
+ 	goto out_exit;
+ }
+ 
+-static int xfrm6_output_finish(struct sk_buff *skb)
++static int xfrm6_output_finish2(struct sk_buff *skb)
+ {
+ 	int err;
+ 
+@@ -167,7 +167,7 @@ static int xfrm6_output_finish(struct sk
+ 			return dst_output(skb);
+ 
+ 		err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL,
+-			      skb->dst->dev, xfrm6_output_finish);
++			      skb->dst->dev, xfrm6_output_finish2);
+ 		if (unlikely(err != 1))
+ 			break;
+ 	}
+@@ -175,6 +175,41 @@ static int xfrm6_output_finish(struct sk
+ 	return err;
+ }
+ 
++static int xfrm6_output_finish(struct sk_buff *skb)
++{
++	struct sk_buff *segs;
++
++	if (!skb_shinfo(skb)->gso_size)
++		return xfrm6_output_finish2(skb);
++
++	skb->protocol = htons(ETH_P_IP);
++	segs = skb_gso_segment(skb, 0);
++	kfree_skb(skb);
++	if (unlikely(IS_ERR(segs)))
++		return PTR_ERR(segs);
++
++	do {
++		struct sk_buff *nskb = segs->next;
++		int err;
++
++		segs->next = NULL;
++		err = xfrm6_output_finish2(segs);
++
++		if (unlikely(err)) {
++			while ((segs = nskb)) {
++				nskb = segs->next;
++				segs->next = NULL;
++				kfree_skb(segs);
++			}
++			return err;
++		}
++
++		segs = nskb;
++	} while (segs);
++
++	return 0;
++}
++
+ int xfrm6_output(struct sk_buff *skb)
+ {
+ 	return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
+diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
+index 99ceb91..28c9efd 100644
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -72,9 +72,9 @@ void qdisc_unlock_tree(struct net_device
+    dev->queue_lock serializes queue accesses for this device
+    AND dev->qdisc pointer itself.
+ 
+-   dev->xmit_lock serializes accesses to device driver.
++   netif_tx_lock serializes accesses to device driver.
+ 
+-   dev->queue_lock and dev->xmit_lock are mutually exclusive,
++   dev->queue_lock and netif_tx_lock are mutually exclusive,
+    if one is grabbed, another must be free.
+  */
+ 
+@@ -90,14 +90,17 @@ void qdisc_unlock_tree(struct net_device
+    NOTE: Called under dev->queue_lock with locally disabled BH.
+ */
+ 
+-int qdisc_restart(struct net_device *dev)
++static inline int qdisc_restart(struct net_device *dev)
+ {
+ 	struct Qdisc *q = dev->qdisc;
+ 	struct sk_buff *skb;
+ 
+ 	/* Dequeue packet */
+-	if ((skb = q->dequeue(q)) != NULL) {
++	if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) {
+ 		unsigned nolock = (dev->features & NETIF_F_LLTX);
++
++		dev->gso_skb = NULL;
++
+ 		/*
+ 		 * When the driver has LLTX set it does its own locking
+ 		 * in start_xmit. No need to add additional overhead by
+@@ -108,7 +111,7 @@ int qdisc_restart(struct net_device *dev
+ 		 * will be requeued.
+ 		 */
+ 		if (!nolock) {
+-			if (!spin_trylock(&dev->xmit_lock)) {
++			if (!netif_tx_trylock(dev)) {
+ 			collision:
+ 				/* So, someone grabbed the driver. */
+ 				
+@@ -126,8 +129,6 @@ int qdisc_restart(struct net_device *dev
+ 				__get_cpu_var(netdev_rx_stat).cpu_collision++;
+ 				goto requeue;
+ 			}
+-			/* Remember that the driver is grabbed by us. */
+-			dev->xmit_lock_owner = smp_processor_id();
+ 		}
+ 		
+ 		{
+@@ -136,14 +137,11 @@ int qdisc_restart(struct net_device *dev
+ 
+ 			if (!netif_queue_stopped(dev)) {
+ 				int ret;
+-				if (netdev_nit)
+-					dev_queue_xmit_nit(skb, dev);
+ 
+-				ret = dev->hard_start_xmit(skb, dev);
++				ret = dev_hard_start_xmit(skb, dev);
+ 				if (ret == NETDEV_TX_OK) { 
+ 					if (!nolock) {
+-						dev->xmit_lock_owner = -1;
+-						spin_unlock(&dev->xmit_lock);
++						netif_tx_unlock(dev);
+ 					}
+ 					spin_lock(&dev->queue_lock);
+ 					return -1;
+@@ -157,8 +155,7 @@ int qdisc_restart(struct net_device *dev
+ 			/* NETDEV_TX_BUSY - we need to requeue */
+ 			/* Release the driver */
+ 			if (!nolock) { 
+-				dev->xmit_lock_owner = -1;
+-				spin_unlock(&dev->xmit_lock);
++				netif_tx_unlock(dev);
+ 			} 
+ 			spin_lock(&dev->queue_lock);
+ 			q = dev->qdisc;
+@@ -175,7 +172,10 @@ int qdisc_restart(struct net_device *dev
+ 		 */
+ 
+ requeue:
+-		q->ops->requeue(skb, q);
++		if (skb->next)
++			dev->gso_skb = skb;
++		else
++			q->ops->requeue(skb, q);
+ 		netif_schedule(dev);
+ 		return 1;
+ 	}
+@@ -183,11 +183,23 @@ requeue:
+ 	return q->q.qlen;
+ }
+ 
++void __qdisc_run(struct net_device *dev)
++{
++	if (unlikely(dev->qdisc == &noop_qdisc))
++		goto out;
++
++	while (qdisc_restart(dev) < 0 && !netif_queue_stopped(dev))
++		/* NOTHING */;
++
++out:
++	clear_bit(__LINK_STATE_QDISC_RUNNING, &dev->state);
++}
++
+ static void dev_watchdog(unsigned long arg)
+ {
+ 	struct net_device *dev = (struct net_device *)arg;
+ 
+-	spin_lock(&dev->xmit_lock);
++	netif_tx_lock(dev);
+ 	if (dev->qdisc != &noop_qdisc) {
+ 		if (netif_device_present(dev) &&
+ 		    netif_running(dev) &&
+@@ -201,7 +213,7 @@ static void dev_watchdog(unsigned long a
+ 				dev_hold(dev);
+ 		}
+ 	}
+-	spin_unlock(&dev->xmit_lock);
++	netif_tx_unlock(dev);
+ 
+ 	dev_put(dev);
+ }
+@@ -225,17 +237,17 @@ void __netdev_watchdog_up(struct net_dev
+ 
+ static void dev_watchdog_up(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	__netdev_watchdog_up(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ static void dev_watchdog_down(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	if (del_timer(&dev->watchdog_timer))
+ 		__dev_put(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ void netif_carrier_on(struct net_device *dev)
+@@ -577,10 +589,17 @@ void dev_deactivate(struct net_device *d
+ 
+ 	dev_watchdog_down(dev);
+ 
+-	while (test_bit(__LINK_STATE_SCHED, &dev->state))
++	/* Wait for outstanding dev_queue_xmit calls. */
++	synchronize_rcu();
++
++	/* Wait for outstanding qdisc_run calls. */
++	while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
+ 		yield();
+ 
+-	spin_unlock_wait(&dev->xmit_lock);
++	if (dev->gso_skb) {
++		kfree_skb(dev->gso_skb);
++		dev->gso_skb = NULL;
++	}
+ }
+ 
+ void dev_init_scheduler(struct net_device *dev)
+@@ -622,6 +641,5 @@ EXPORT_SYMBOL(qdisc_create_dflt);
+ EXPORT_SYMBOL(qdisc_alloc);
+ EXPORT_SYMBOL(qdisc_destroy);
+ EXPORT_SYMBOL(qdisc_reset);
+-EXPORT_SYMBOL(qdisc_restart);
+ EXPORT_SYMBOL(qdisc_lock_tree);
+ EXPORT_SYMBOL(qdisc_unlock_tree);
+diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
+index 79b8ef3..4c16ad5 100644
+--- a/net/sched/sch_teql.c
++++ b/net/sched/sch_teql.c
+@@ -302,20 +302,17 @@ restart:
+ 
+ 		switch (teql_resolve(skb, skb_res, slave)) {
+ 		case 0:
+-			if (spin_trylock(&slave->xmit_lock)) {
+-				slave->xmit_lock_owner = smp_processor_id();
++			if (netif_tx_trylock(slave)) {
+ 				if (!netif_queue_stopped(slave) &&
+ 				    slave->hard_start_xmit(skb, slave) == 0) {
+-					slave->xmit_lock_owner = -1;
+-					spin_unlock(&slave->xmit_lock);
++					netif_tx_unlock(slave);
+ 					master->slaves = NEXT_SLAVE(q);
+ 					netif_wake_queue(dev);
+ 					master->stats.tx_packets++;
+ 					master->stats.tx_bytes += len;
+ 					return 0;
+ 				}
+-				slave->xmit_lock_owner = -1;
+-				spin_unlock(&slave->xmit_lock);
++				netif_tx_unlock(slave);
+ 			}
+ 			if (netif_queue_stopped(dev))
+ 				busy = 1;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-27 12:07 UTC

head link

[Xen-devel] [2/5] [NET]: Added GSO header verification

Hi:

This patch is yet to be merged upstream.

[NET]: Added GSO header verification

When GSO packets come from an untrusted source (e.g., a Xen guest domain),
we need to verify the header integrity before passing it to the hardware.

Since the first step in GSO is to verify the header, we can reuse that
code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
bit set can only be fed directly to devices with the corresponding bit
NETIF_F_GSO_ROBUST.  If the device doesn''t have that bit, then the skb
is fed to the GSO engine which will allow the packet to be sent to the
hardware if it passes the header check.

This patch changes the sg flag to a full features flag.  The same method
can be used to implement TSO ECN support.  We simply have to mark packets
with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
the packet, or segment the first MTU and pass the rest to the hardware for
further segmentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r b255efd0df72 -r f328329eb465 linux-2.6-xen-sparse/include/linux/skbuff.h
--- a/linux-2.6-xen-sparse/include/linux/skbuff.h	Tue Jun 27 21:27:13 2006 +1000
+++ b/linux-2.6-xen-sparse/include/linux/skbuff.h	Tue Jun 27 21:29:42 2006 +1000
@@ -172,6 +172,9 @@ enum {
 enum {
 	SKB_GSO_TCPV4 = 1 << 0,
 	SKB_GSO_UDPV4 = 1 << 1,
+
+	/* This indicates the skb is from an untrusted source. */
+	SKB_GSO_DODGY = 1 << 2,
 };
 
 /** 
@@ -1285,7 +1288,7 @@ extern void	       skb_split(struct sk_b
 				 struct sk_buff *skb1, const u32 len);
 
 extern void	       skb_release_data(struct sk_buff *skb);
-extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg);
+extern struct sk_buff *skb_segment(struct sk_buff *skb, int features);
 
 static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
 				       int len, void *buffer)
diff -r b255efd0df72 -r f328329eb465 linux-2.6-xen-sparse/net/core/dev.c
--- a/linux-2.6-xen-sparse/net/core/dev.c	Tue Jun 27 21:27:13 2006 +1000
+++ b/linux-2.6-xen-sparse/net/core/dev.c	Tue Jun 27 21:29:42 2006 +1000
@@ -1116,11 +1116,14 @@ out:
 /**
  *	skb_gso_segment - Perform segmentation on skb.
  *	@skb: buffer to segment
- *	@sg: whether scatter-gather is supported on the target.
+ *	@features: features for the output path (see dev->features)
  *
  *	This function segments the given skb and returns a list of segments.
- */
-struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg)
+ *
+ *	It may return NULL if the skb requires no segmentation.  This is
+ *	only possible when GSO is used for verifying header integrity.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
 {
 	struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
 	struct packet_type *ptype;
@@ -1136,11 +1139,13 @@ struct sk_buff *skb_gso_segment(struct s
 	rcu_read_lock();
 	list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
 		if (ptype->type == type && !ptype->dev &&
ptype->gso_segment) {
-			segs = ptype->gso_segment(skb, sg);
+			segs = ptype->gso_segment(skb, features);
 			break;
 		}
 	}
 	rcu_read_unlock();
+
+	__skb_push(skb, skb->data - skb->mac.raw);
 
 	return segs;
 }
@@ -1217,9 +1222,15 @@ static int dev_gso_segment(struct sk_buf
 {
 	struct net_device *dev = skb->dev;
 	struct sk_buff *segs;
-
-	segs = skb_gso_segment(skb, dev->features & NETIF_F_SG &&
-				    !illegal_highdma(dev, skb));
+	int features = dev->features & ~(illegal_highdma(dev, skb) ?
+					 NETIF_F_SG : 0);
+
+	segs = skb_gso_segment(skb, features);
+
+	/* Verifying header integrity only. */
+	if (!segs)
+		return 0;
+
 	if (unlikely(IS_ERR(segs)))
 		return PTR_ERR(segs);
 
@@ -1236,13 +1247,17 @@ int dev_hard_start_xmit(struct sk_buff *
 		if (netdev_nit)
 			dev_queue_xmit_nit(skb, dev);
 
-		if (!netif_needs_gso(dev, skb))
-			return dev->hard_start_xmit(skb, dev);
-
-		if (unlikely(dev_gso_segment(skb)))
-			goto out_kfree_skb;
-	}
-
+		if (netif_needs_gso(dev, skb)) {
+			if (unlikely(dev_gso_segment(skb)))
+				goto out_kfree_skb;
+			if (skb->next)
+				goto gso;
+		}
+
+		return dev->hard_start_xmit(skb, dev);
+	}
+
+gso:
 	do {
 		struct sk_buff *nskb = skb->next;
 		int rc;
diff -r b255efd0df72 -r f328329eb465 linux-2.6-xen-sparse/net/core/skbuff.c
--- a/linux-2.6-xen-sparse/net/core/skbuff.c	Tue Jun 27 21:27:13 2006 +1000
+++ b/linux-2.6-xen-sparse/net/core/skbuff.c	Tue Jun 27 21:29:42 2006 +1000
@@ -1804,13 +1804,13 @@ int skb_append_datato_frags(struct sock 
 /**
  *	skb_segment - Perform protocol segmentation on skb.
  *	@skb: buffer to segment
- *	@sg: whether scatter-gather can be used for generated segments
+ *	@features: features for the output path (see dev->features)
  *
  *	This function performs segmentation on the given skb.  It returns
  *	the segment at the given position.  It returns NULL if there are
  *	no more segments to generate, or when an error is encountered.
  */
-struct sk_buff *skb_segment(struct sk_buff *skb, int sg)
+struct sk_buff *skb_segment(struct sk_buff *skb, int features)
 {
 	struct sk_buff *segs = NULL;
 	struct sk_buff *tail = NULL;
@@ -1819,6 +1819,7 @@ struct sk_buff *skb_segment(struct sk_bu
 	unsigned int offset = doffset;
 	unsigned int headroom;
 	unsigned int len;
+	int sg = features & NETIF_F_SG;
 	int nfrags = skb_shinfo(skb)->nr_frags;
 	int err = -ENOMEM;
 	int i = 0;
diff -r b255efd0df72 -r f328329eb465 patches/linux-2.6.16.13/net-gso.patch
--- a/patches/linux-2.6.16.13/net-gso.patch	Tue Jun 27 21:27:13 2006 +1000
+++ b/patches/linux-2.6.16.13/net-gso.patch	Tue Jun 27 21:29:42 2006 +1000
@@ -2283,3 +2283,163 @@ index 79b8ef3..4c16ad5 100644
  			}
  			if (netif_queue_stopped(dev))
  				busy = 1;
+diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
+index 458b278..58e5bae 100644
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -314,6 +314,7 @@ #define NETIF_F_LLTX		4096	/* LockLess T
+ #define NETIF_F_GSO_SHIFT	16
+ #define NETIF_F_TSO		(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
+ #define NETIF_F_UFO		(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
++#define NETIF_F_GSO_ROBUST	(SKB_GSO_DODGY << NETIF_F_GSO_SHIFT)
+ 
+ #define NETIF_F_GEN_CSUM	(NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
+ #define NETIF_F_ALL_CSUM	(NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
+@@ -538,7 +539,8 @@ struct packet_type {
+ 					 struct net_device *,
+ 					 struct packet_type *,
+ 					 struct net_device *);
+-	struct sk_buff		*(*gso_segment)(struct sk_buff *skb, int sg);
++	struct sk_buff		*(*gso_segment)(struct sk_buff *skb,
++						int features);
+ 	void			*af_packet_priv;
+ 	struct list_head	list;
+ };
+@@ -977,7 +979,7 @@ extern int		netdev_max_backlog;
+ extern int		weight_p;
+ extern int		netdev_set_master(struct net_device *dev, struct net_device
*master);
+ extern int skb_checksum_help(struct sk_buff *skb, int inward);
+-extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg);
++extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features);
+ #ifdef CONFIG_BUG
+ extern void netdev_rx_csum_fault(struct net_device *dev);
+ #else
+@@ -997,11 +999,16 @@ #endif
+ 
+ extern void linkwatch_run_queue(void);
+ 
++static inline int skb_gso_ok(struct sk_buff *skb, int features)
++{
++	int feature = skb_shinfo(skb)->gso_size ?
++		      skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT : 0;
++	return (features & feature) != feature;
++}
++
+ static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb)
+ {
+-	int feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT;
+-	return skb_shinfo(skb)->gso_size &&
+-	       (dev->features & feature) != feature;
++	return skb_gso_ok(skb, dev->features);
+ }
+ 
+ #endif /* __KERNEL__ */
+diff --git a/include/net/protocol.h b/include/net/protocol.h
+index 650911d..0d2dcdb 100644
+--- a/include/net/protocol.h
++++ b/include/net/protocol.h
+@@ -37,7 +37,8 @@ #define MAX_INET_PROTOS	256		/* Must be 
+ struct net_protocol {
+ 	int			(*handler)(struct sk_buff *skb);
+ 	void			(*err_handler)(struct sk_buff *skb, u32 info);
+-	struct sk_buff	       *(*gso_segment)(struct sk_buff *skb, int sg);
++	struct sk_buff	       *(*gso_segment)(struct sk_buff *skb,
++					       int features);
+ 	int			no_policy;
+ };
+ 
+diff --git a/include/net/tcp.h b/include/net/tcp.h
+index 40e337c..70e1d5f 100644
+--- a/include/net/tcp.h
++++ b/include/net/tcp.h
+@@ -1063,7 +1063,7 @@ extern struct request_sock_ops tcp_reque
+ 
+ extern int tcp_v4_destroy_sock(struct sock *sk);
+ 
+-extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg);
++extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features);
+ 
+ #ifdef CONFIG_PROC_FS
+ extern int  tcp4_proc_init(void);
+diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
+index b9e32b6..180e79b 100644
+--- a/net/bridge/br_device.c
++++ b/net/bridge/br_device.c
+@@ -185,6 +185,6 @@ void br_dev_setup(struct net_device *dev
+ 	dev->set_mac_address = br_set_mac_address;
+ 	dev->priv_flags = IFF_EBRIDGE;
+ 
+- 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST
+- 		| NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_NO_CSUM;
++ 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
++ 			NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST;
+ }
+diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
+index 4e9743d..0617146 100644
+--- a/net/bridge/br_if.c
++++ b/net/bridge/br_if.c
+@@ -405,7 +405,8 @@ void br_features_recompute(struct net_br
+ 		features &= feature;
+ 	}
+ 
+-	br->dev->features = features | checksum | NETIF_F_LLTX;
++	br->dev->features = features | checksum | NETIF_F_LLTX |
++			    NETIF_F_GSO_ROBUST;
+ }
+ 
+ /* called with RTNL */
+diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
+index 3692db5..5ba719e 100644
+--- a/net/ipv4/af_inet.c
++++ b/net/ipv4/af_inet.c
+@@ -1085,7 +1085,7 @@ int inet_sk_rebuild_header(struct sock *
+ 
+ EXPORT_SYMBOL(inet_sk_rebuild_header);
+ 
+-static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int sg)
++static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features)
+ {
+ 	struct sk_buff *segs = ERR_PTR(-EINVAL);
+ 	struct iphdr *iph;
+@@ -1114,10 +1114,10 @@ static struct sk_buff *inet_gso_segment(
+ 	rcu_read_lock();
+ 	ops = rcu_dereference(inet_protos[proto]);
+ 	if (ops && ops->gso_segment)
+-		segs = ops->gso_segment(skb, sg);
++		segs = ops->gso_segment(skb, features);
+ 	rcu_read_unlock();
+ 
+-	if (IS_ERR(segs))
++	if (!segs || unlikely(IS_ERR(segs)))
+ 		goto out;
+ 
+ 	skb = segs;
+diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
+index de0753f..84130c9 100644
+--- a/net/ipv4/tcp.c
++++ b/net/ipv4/tcp.c
+@@ -2021,7 +2021,7 @@ int tcp_getsockopt(struct sock *sk, int 
+ }
+ 
+ 
+-struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg)
++struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
+ {
+ 	struct sk_buff *segs = ERR_PTR(-EINVAL);
+ 	struct tcphdr *th;
+@@ -2042,10 +2042,14 @@ struct sk_buff *tcp_tso_segment(struct s
+ 	if (!pskb_may_pull(skb, thlen))
+ 		goto out;
+ 
++	segs = NULL;
++	if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST))
++		goto out;
++
+ 	oldlen = (u16)~skb->len;
+ 	__skb_pull(skb, thlen);
+ 
+-	segs = skb_segment(skb, sg);
++	segs = skb_segment(skb, features);
+ 	if (IS_ERR(segs))
+ 		goto out;
+

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-27 12:08 UTC

head link

[Xen-devel] [3/5] [NET] loopback: Added support for TSO

Hi:

[NET] loopback: Added support for TSO

Just like SG, TSO support here is innate.  So all we need to do is mark it
as such.  This patch also adds the ethtool control functions for SG.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r f328329eb465 -r c88ee5ecc39c
linux-2.6-xen-sparse/drivers/xen/netback/loopback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c	Tue Jun 27 21:29:42
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c	Tue Jun 27 21:30:08
2006 +1000
@@ -125,6 +125,10 @@ static struct ethtool_ops network_ethtoo
 {
 	.get_tx_csum = ethtool_op_get_tx_csum,
 	.set_tx_csum = ethtool_op_set_tx_csum,
+	.get_sg = ethtool_op_get_sg,
+	.set_sg = ethtool_op_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = ethtool_op_set_tso,
 };
 
 /*
@@ -152,6 +156,7 @@ static void loopback_construct(struct ne
 
 	dev->features        = (NETIF_F_HIGHDMA |
 				NETIF_F_LLTX |
+				NETIF_F_TSO |
 				NETIF_F_SG |
 				NETIF_F_IP_CSUM);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-27 12:08 UTC

head link

[Xen-devel] [4/5] [NET] back: Add TSO support

Hi:

[NET] back: Add TSO support

This patch adds TCP Segmentation Offload (TSO) support to the backend.
It also advertises this fact through xenbus so that the frontend can
detect this and send through TSO requests only if it is supported.

This is done using an extra request slot which is indicated by a flag
in the first slot.  In future checksum offload can be done in the same
way.

Even though only TSO is supported for now the code actually supports
GSO so it can be applied to any other protocol.  The only missing bit
is the detection of host support for a specific GSO protocol.  Once that
is added we can advertise all supported protocols to the guest.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r c88ee5ecc39c -r f6d1f558bf1c
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Tue Jun 27 21:30:08
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Tue Jun 27 21:30:11
2006 +1000
@@ -43,7 +43,7 @@ static void netif_idx_release(u16 pendin
 static void netif_idx_release(u16 pending_idx);
 static void netif_page_release(struct page *page);
 static void make_tx_response(netif_t *netif, 
-			     u16      id,
+			     netif_tx_request_t *txp,
 			     s8       st);
 static int  make_rx_response(netif_t *netif, 
 			     u16      id, 
@@ -481,7 +481,7 @@ inline static void net_tx_action_dealloc
 
 		netif = pending_tx_info[pending_idx].netif;
 
-		make_tx_response(netif, pending_tx_info[pending_idx].req.id, 
+		make_tx_response(netif, &pending_tx_info[pending_idx].req, 
 				 NETIF_RSP_OKAY);
 
 		pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx;
@@ -490,14 +490,16 @@ inline static void net_tx_action_dealloc
 	}
 }
 
-static void netbk_tx_err(netif_t *netif, RING_IDX end)
+static void netbk_tx_err(netif_t *netif, netif_tx_request_t *txp, RING_IDX end)
 {
 	RING_IDX cons = netif->tx.req_cons;
 
 	do {
-		netif_tx_request_t *txp = RING_GET_REQUEST(&netif->tx, cons);
-		make_tx_response(netif, txp->id, NETIF_RSP_ERROR);
-	} while (++cons < end);
+		make_tx_response(netif, txp, NETIF_RSP_ERROR);
+		if (++cons >= end)
+			break;
+		txp = RING_GET_REQUEST(&netif->tx, cons);
+	} while (1);
 	netif->tx.req_cons = cons;
 	netif_schedule_work(netif);
 	netif_put(netif);
@@ -508,7 +510,7 @@ static int netbk_count_requests(netif_t 
 {
 	netif_tx_request_t *first = txp;
 	RING_IDX cons = netif->tx.req_cons;
-	int frags = 1;
+	int frags = 0;
 
 	while (txp->flags & NETTXF_more_data) {
 		if (frags >= work_to_do) {
@@ -543,7 +545,7 @@ static gnttab_map_grant_ref_t *netbk_get
 	skb_frag_t *frags = shinfo->frags;
 	netif_tx_request_t *txp;
 	unsigned long pending_idx = *((u16 *)skb->data);
-	RING_IDX cons = netif->tx.req_cons + 1;
+	RING_IDX cons = netif->tx.req_cons;
 	int i, start;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -581,7 +583,7 @@ static int netbk_tx_check_mop(struct sk_
 	err = mop->status;
 	if (unlikely(err)) {
 		txp = &pending_tx_info[pending_idx].req;
-		make_tx_response(netif, txp->id, NETIF_RSP_ERROR);
+		make_tx_response(netif, txp, NETIF_RSP_ERROR);
 		pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx;
 		netif_put(netif);
 	} else {
@@ -614,7 +616,7 @@ static int netbk_tx_check_mop(struct sk_
 
 		/* Error on this fragment: respond to client with an error. */
 		txp = &pending_tx_info[pending_idx].req;
-		make_tx_response(netif, txp->id, NETIF_RSP_ERROR);
+		make_tx_response(netif, txp, NETIF_RSP_ERROR);
 		pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx;
 		netif_put(netif);
 
@@ -668,6 +670,7 @@ static void net_tx_action(unsigned long 
 	struct sk_buff *skb;
 	netif_t *netif;
 	netif_tx_request_t txreq;
+	struct netif_tx_extra txtra;
 	u16 pending_idx;
 	RING_IDX i;
 	gnttab_map_grant_ref_t *mop;
@@ -726,22 +729,37 @@ static void net_tx_action(unsigned long 
 		}
 		netif->remaining_credit -= txreq.size;
 
+		work_to_do--;
+		netif->tx.req_cons = ++i;
+
+		if (txreq.flags & NETTXF_extra_info) {
+			if (work_to_do-- <= 0) {
+				DPRINTK("Missing extra info\n");
+				netbk_tx_err(netif, &txreq, i);
+				continue;
+			}
+
+			memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i),
+			       sizeof(txtra));
+			netif->tx.req_cons = ++i;
+		}
+
 		ret = netbk_count_requests(netif, &txreq, work_to_do);
 		if (unlikely(ret < 0)) {
-			netbk_tx_err(netif, i - ret);
+			netbk_tx_err(netif, &txreq, i - ret);
 			continue;
 		}
 		i += ret;
 
 		if (unlikely(ret > MAX_SKB_FRAGS + 1)) {
 			DPRINTK("Too many frags\n");
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			continue;
 		}
 
 		if (unlikely(txreq.size < ETH_HLEN)) {
 			DPRINTK("Bad packet size: %d\n", txreq.size);
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			continue; 
 		}
 
@@ -750,25 +768,31 @@ static void net_tx_action(unsigned long 
 			DPRINTK("txreq.offset: %x, size: %u, end: %lu\n", 
 				txreq.offset, txreq.size, 
 				(txreq.offset &~PAGE_MASK) + txreq.size);
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			continue;
 		}
 
 		pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
-			    ret < MAX_SKB_FRAGS + 1) ?
+			    ret < MAX_SKB_FRAGS) ?
 			PKT_PROT_LEN : txreq.size;
 
 		skb = alloc_skb(data_len+16, GFP_ATOMIC);
 		if (unlikely(skb == NULL)) {
 			DPRINTK("Can''t allocate a skb in start_xmit.\n");
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			break;
 		}
 
 		/* Packets passed to netif_rx() must have some headroom. */
 		skb_reserve(skb, 16);
+
+		if (txreq.flags & NETTXF_gso) {
+			skb_shinfo(skb)->gso_size = txtra.gso_size;
+			skb_shinfo(skb)->gso_segs = txtra.gso_segs;
+			skb_shinfo(skb)->gso_type = txtra.gso_type;
+		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
 				  GNTMAP_host_map | GNTMAP_readonly,
@@ -782,7 +806,7 @@ static void net_tx_action(unsigned long 
 
 		__skb_put(skb, data_len);
 
-		skb_shinfo(skb)->nr_frags = ret - 1;
+		skb_shinfo(skb)->nr_frags = ret;
 		if (data_len < txreq.size) {
 			skb_shinfo(skb)->nr_frags++;
 			skb_shinfo(skb)->frags[0].page @@ -898,7 +922,7 @@ irqreturn_t
netif_be_int(int irq, void *
 }
 
 static void make_tx_response(netif_t *netif, 
-			     u16      id,
+			     netif_tx_request_t *txp,
 			     s8       st)
 {
 	RING_IDX i = netif->tx.rsp_prod_pvt;
@@ -906,8 +930,11 @@ static void make_tx_response(netif_t *ne
 	int notify;
 
 	resp = RING_GET_RESPONSE(&netif->tx, i);
-	resp->id     = id;
+	resp->id     = txp->id;
 	resp->status = st;
+
+	if (txp->flags & NETTXF_extra_info)
+		RING_GET_RESPONSE(&netif->tx, ++i)->status = NETIF_RSP_NULL;
 
 	netif->tx.rsp_prod_pvt = ++i;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->tx, notify);
diff -r c88ee5ecc39c -r f6d1f558bf1c
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Tue Jun 27 21:30:08 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Tue Jun 27 21:30:11 2006
+1000
@@ -101,6 +101,12 @@ static int netback_probe(struct xenbus_d
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-tso",
"%d", 1);
+		if (err) {
+			message = "writing feature-tso";
+			goto abort_transaction;
+		}
+
 		err = xenbus_transaction_end(xbt, 0);
 	} while (err == -EAGAIN);
 
diff -r c88ee5ecc39c -r f6d1f558bf1c xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Tue Jun 27 21:30:08 2006 +1000
+++ b/xen/include/public/io/netif.h	Tue Jun 27 21:30:11 2006 +1000
@@ -31,6 +31,13 @@
 #define _NETTXF_more_data      (2)
 #define  NETTXF_more_data      (1U<<_NETTXF_more_data)
 
+/* Packet has GSO fields. */
+#define _NETTXF_gso	       (3)
+#define  NETTXF_gso	       (1U<<_NETTXF_gso)
+
+/* Packet to be folloed by extra descritptor. */
+#define  NETTXF_extra_info     (NETTXF_gso)
+
 struct netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
     uint16_t offset;       /* Offset within buffer page */
@@ -39,6 +46,13 @@ struct netif_tx_request {
     uint16_t size;         /* Packet size in bytes.       */
 };
 typedef struct netif_tx_request netif_tx_request_t;
+
+/* This structure needs to fit within netif_tx_request for compatibility. */
+struct netif_tx_extra {
+    uint16_t gso_size;	   /* GSO MSS. */
+    uint16_t gso_segs;	   /* GSO segment count. */
+    uint16_t gso_type;	   /* GSO type. */
+};
 
 struct netif_tx_response {
     uint16_t id;
@@ -78,6 +92,7 @@ DEFINE_RING_TYPES(netif_rx, struct netif
 #define NETIF_RSP_DROPPED         -2
 #define NETIF_RSP_ERROR           -1
 #define NETIF_RSP_OKAY             0
+#define NETIF_RSP_NULL		   1
 
 #endif
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-27 12:09 UTC

head link

[Xen-devel] [5/5] [NET] front: Transmit TSO packets if supported

Hi:

[NET] front: Transmit TSO packets if supported

This patch adds TSO transmission support to the frontend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r f6d1f558bf1c -r f41f42b29fae
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Tue Jun 27 21:30:11
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Tue Jun 27 21:47:07
2006 +1000
@@ -463,7 +463,7 @@ static int network_open(struct net_devic
 
 static inline int netfront_tx_slot_available(struct netfront_info *np)
 {
-	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1;
+	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2;
 }
 
 static inline void network_maybe_wake_tx(struct net_device *dev)
@@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net
 		rmb(); /* Ensure we see responses up to ''rp''. */
 
 		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
-			id  = RING_GET_RESPONSE(&np->tx, cons)->id;
+			struct netif_tx_response *txrsp;
+
+			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			if (txrsp->status == NETIF_RSP_NULL)
+				continue;
+
+			id  = txrsp->id;
 			skb = np->tx_skbs[id];
 			if (unlikely(gnttab_query_foreign_access(
 				np->grant_tx_ref[id]) != 0)) {
@@ -739,7 +745,8 @@ static int network_start_xmit(struct sk_
 	spin_lock_irq(&np->tx_lock);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)))) {
+		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     netif_needs_gso(dev, skb))) {
 		spin_unlock_irq(&np->tx_lock);
 		goto drop;
 	}
@@ -762,9 +769,21 @@ static int network_start_xmit(struct sk_
 	tx->size = len;
 
 	tx->flags = 0;
-	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
+	if (skb->ip_summed == CHECKSUM_HW) {
+		/* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
-	if (skb->proto_data_valid) /* remote but checksummed? */
+
+		if (skb_shinfo(skb)->gso_size) {
+			struct netif_tx_extra *txtra +				(struct netif_tx_extra *)
+				RING_GET_REQUEST(&np->tx, ++i);
+
+			tx->flags |= NETTXF_gso;
+			txtra->gso_size = skb_shinfo(skb)->gso_size;
+			txtra->gso_segs = skb_shinfo(skb)->gso_segs;
+			txtra->gso_type = skb_shinfo(skb)->gso_type;
+		}
+	} else if (skb->proto_data_valid) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
 
 	np->tx.req_prod_pvt = i + 1;
@@ -1065,9 +1084,26 @@ static int xennet_set_sg(struct net_devi
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int xennet_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		struct netfront_info *np = netdev_priv(dev);
+		int val;
+
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso",
+				 "%d", &val) < 0)
+			val = 0;
+		if (!val)
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
 static void xennet_set_features(struct net_device *dev)
 {
-	xennet_set_sg(dev, 1);
+	if (!xennet_set_sg(dev, 1))
+		xennet_set_tso(dev, 1);
 }
 
 static void network_connect(struct net_device *dev)
@@ -1148,6 +1184,8 @@ static struct ethtool_ops network_ethtoo
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = xennet_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = xennet_set_tso,
 };
 
 #ifdef CONFIG_SYSFS

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-27 13:27 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 27 Jun 2006, at 13:02, Herbert Xu wrote:
> Comparison with base line and jumbo MTU:
>
> baseline:	1228.08Mb/s
> mtu=16436:	3097.49Mb/s
> TSO:		3208.41Mb/s
> mtu=60040:	3869.36Mb/s
>
> lo(16436):	5543.91Mb/s
> lo(60040):	8544.08Mb/s
Do you have any numbers for transmit to external host? It''d be 
particularly interesting to see how you improve CPU overheads of 
domain0 and domainU (since throughput isn''t usually much less than 
native anyway, but CPU overhead can be significantly higher).

Comments on the patches:
  1. Can you make two separate patches in the patches/ directory -- one 
for the stuff already accepted upstream and a separate one for the 
stuff that is not yet accepted. Prefix the patch files with numbers to 
make sure they get applied in the right order.
  2. As well as the patches you provide, some generic Linux files are 
modified in the sparse tree because we already modify them there. Can 
you please also place a copy of your changes inside your patches/ 
files. This will make it easier for us to forward port to 2.6.18 as we 
can see which changes in the sparse tree actually correspond to the 
generic GSO changes.
  3. Can you give brief documentation, or even just send a an email 
explaining, how you change the interdomain protocol to add GSO?
  4. The change to make_tx_response() prototype in netback you could 
send us as a separate patch. It would make the real GSO changes to 
netback smaller and clearer.

  Thanks!
  Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-27 14:02 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Tue, Jun 27, 2006 at 02:27:20PM +0100, Keir Fraser
wrote:> 
> Do you have any numbers for transmit to external host? It''d be 
> particularly interesting to see how you improve CPU overheads of 
> domain0 and domainU (since throughput isn''t usually much less than
> native anyway, but CPU overhead can be significantly higher).
Sure.  Here are figures from xm top on the same laptop.  It only has
an e100 so there would be a lot more savings with a TSO-capable NIC:

			dom0		domU
idle: 			3.2%		0%
TSO:			17.7%		5.9%
baseline:		18.6%		6.9%

So overall it''s 22.3% vs. 20.4%.  In both cases the actual throughput
is around 92Mb/s.

BTW, the difference here is much smaller than inter-domain traffic
because this is purely send-only while with inter-domain traffic we
actually have the corresponding benefit of the receive side as well
with the super-packet passing through the receive side as is.
> Comments on the patches:
I will repost tomorrow with your comments taken into consideration.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 03:57 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Hi Keir:

OK here they are again with your comments addressed.

On Tue, Jun 27, 2006 at 02:27:20PM +0100, Keir Fraser
wrote:> 
> Comments on the patches:
>  1. Can you make two separate patches in the patches/ directory -- one 
> for the stuff already accepted upstream and a separate one for the 
> stuff that is not yet accepted. Prefix the patch files with numbers to 
> make sure they get applied in the right order.
This is no longer necessary as the header verification patch has now
been merged.
>  2. As well as the patches you provide, some generic Linux files are 
> modified in the sparse tree because we already modify them there. Can 
> you please also place a copy of your changes inside your patches/ 
> files. This will make it easier for us to forward port to 2.6.18 as we 
> can see which changes in the sparse tree actually correspond to the 
> generic GSO changes.
Done.
>  3. Can you give brief documentation, or even just send a an email 
> explaining, how you change the interdomain protocol to add GSO?
I''ve included it in the TSO backend patch''s description.
>  4. The change to make_tx_response() prototype in netback you could 
> send us as a separate patch. It would make the real GSO changes to 
> netback smaller and clearer.
Done.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 03:58 UTC

head link

[Xen-devel] [1/5] [NET]: Added GSO support

Hi:

[NET]: Added GSO support

Imported GSO patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 1da8f53ce65b -r 6913e0756b81 linux-2.6-xen-sparse/include/linux/skbuff.h
--- a/linux-2.6-xen-sparse/include/linux/skbuff.h	Tue Jun 27 18:24:08 2006 +0100
+++ b/linux-2.6-xen-sparse/include/linux/skbuff.h	Wed Jun 28 13:44:18 2006 +1000
@@ -134,9 +134,10 @@ struct skb_shared_info {
 struct skb_shared_info {
 	atomic_t	dataref;
 	unsigned short	nr_frags;
-	unsigned short	tso_size;
-	unsigned short	tso_segs;
-	unsigned short  ufo_size;
+	unsigned short	gso_size;
+	/* Warning: this field is not always filled in (UFO)! */
+	unsigned short	gso_segs;
+	unsigned short  gso_type;
 	unsigned int    ip6_frag_id;
 	struct sk_buff	*frag_list;
 	skb_frag_t	frags[MAX_SKB_FRAGS];
@@ -166,6 +167,14 @@ enum {
 	SKB_FCLONE_UNAVAILABLE,
 	SKB_FCLONE_ORIG,
 	SKB_FCLONE_CLONE,
+};
+
+enum {
+	SKB_GSO_TCPV4 = 1 << 0,
+	SKB_GSO_UDPV4 = 1 << 1,
+
+	/* This indicates the skb is from an untrusted source. */
+	SKB_GSO_DODGY = 1 << 2,
 };
 
 /** 
@@ -1157,18 +1166,34 @@ static inline int skb_can_coalesce(struc
 	return 0;
 }
 
+static inline int __skb_linearize(struct sk_buff *skb)
+{
+	return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM;
+}
+
 /**
  *	skb_linearize - convert paged skb to linear one
  *	@skb: buffer to linarize
- *	@gfp: allocation mode
  *
  *	If there is no free memory -ENOMEM is returned, otherwise zero
  *	is returned and the old skb data released.
  */
-extern int __skb_linearize(struct sk_buff *skb, gfp_t gfp);
-static inline int skb_linearize(struct sk_buff *skb, gfp_t gfp)
-{
-	return __skb_linearize(skb, gfp);
+static inline int skb_linearize(struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) ? __skb_linearize(skb) : 0;
+}
+
+/**
+ *	skb_linearize_cow - make sure skb is linear and writable
+ *	@skb: buffer to process
+ *
+ *	If there is no free memory -ENOMEM is returned, otherwise zero
+ *	is returned and the old skb data released.
+ */
+static inline int skb_linearize_cow(struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) || skb_cloned(skb) ?
+	       __skb_linearize(skb) : 0;
 }
 
 /**
@@ -1263,6 +1288,7 @@ extern void	       skb_split(struct sk_b
 				 struct sk_buff *skb1, const u32 len);
 
 extern void	       skb_release_data(struct sk_buff *skb);
+extern struct sk_buff *skb_segment(struct sk_buff *skb, int features);
 
 static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
 				       int len, void *buffer)
diff -r 1da8f53ce65b -r 6913e0756b81 linux-2.6-xen-sparse/net/core/dev.c
--- a/linux-2.6-xen-sparse/net/core/dev.c	Tue Jun 27 18:24:08 2006 +0100
+++ b/linux-2.6-xen-sparse/net/core/dev.c	Wed Jun 28 13:44:18 2006 +1000
@@ -115,6 +115,7 @@
 #include <net/iw_handler.h>
 #endif	/* CONFIG_NET_RADIO */
 #include <asm/current.h>
+#include <linux/err.h>
 
 #ifdef CONFIG_XEN
 #include <net/ip.h>
@@ -1038,7 +1039,7 @@ static inline void net_timestamp(struct 
  *	taps currently in use.
  */
 
-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct packet_type *ptype;
 
@@ -1112,6 +1113,45 @@ out:
 	return ret;
 }
 
+/**
+ *	skb_gso_segment - Perform segmentation on skb.
+ *	@skb: buffer to segment
+ *	@features: features for the output path (see dev->features)
+ *
+ *	This function segments the given skb and returns a list of segments.
+ *
+ *	It may return NULL if the skb requires no segmentation.  This is
+ *	only possible when GSO is used for verifying header integrity.
+ */
+struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
+{
+	struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
+	struct packet_type *ptype;
+	int type = skb->protocol;
+
+	BUG_ON(skb_shinfo(skb)->frag_list);
+	BUG_ON(skb->ip_summed != CHECKSUM_HW);
+
+	skb->mac.raw = skb->data;
+	skb->mac_len = skb->nh.raw - skb->data;
+	__skb_pull(skb, skb->mac_len);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
+		if (ptype->type == type && !ptype->dev &&
ptype->gso_segment) {
+			segs = ptype->gso_segment(skb, features);
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	__skb_push(skb, skb->data - skb->mac.raw);
+
+	return segs;
+}
+
+EXPORT_SYMBOL(skb_gso_segment);
+
 /* Take action when hardware reception checksum errors are detected. */
 #ifdef CONFIG_BUG
 void netdev_rx_csum_fault(struct net_device *dev)
@@ -1148,75 +1188,108 @@ static inline int illegal_highdma(struct
 #define illegal_highdma(dev, skb)	(0)
 #endif
 
-/* Keep head the same: replace data */
-int __skb_linearize(struct sk_buff *skb, gfp_t gfp_mask)
-{
-	unsigned int size;
-	u8 *data;
-	long offset;
-	struct skb_shared_info *ninfo;
-	int headerlen = skb->data - skb->head;
-	int expand = (skb->tail + skb->data_len) - skb->end;
-
-	if (skb_shared(skb))
-		BUG();
-
-	if (expand <= 0)
-		expand = 0;
-
-	size = skb->end - skb->head + expand;
-	size = SKB_DATA_ALIGN(size);
-	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
-	if (!data)
-		return -ENOMEM;
-
-	/* Copy entire thing */
-	if (skb_copy_bits(skb, -headerlen, data, headerlen + skb->len))
-		BUG();
-
-	/* Set up shinfo */
-	ninfo = (struct skb_shared_info*)(data + size);
-	atomic_set(&ninfo->dataref, 1);
-	ninfo->tso_size = skb_shinfo(skb)->tso_size;
-	ninfo->tso_segs = skb_shinfo(skb)->tso_segs;
-	ninfo->nr_frags = 0;
-	ninfo->frag_list = NULL;
-
-	/* Offset between the two in bytes */
-	offset = data - skb->head;
-
-	/* Free old data. */
-	skb_release_data(skb);
-
-	skb->head = data;
-	skb->end  = data + size;
-
-	/* Set up new pointers */
-	skb->h.raw   += offset;
-	skb->nh.raw  += offset;
-	skb->mac.raw += offset;
-	skb->tail    += offset;
-	skb->data    += offset;
-
-	/* We are no longer a clone, even if we were. */
-	skb->cloned    = 0;
-
-	skb->tail     += skb->data_len;
-	skb->data_len  = 0;
+struct dev_gso_cb {
+	void (*destructor)(struct sk_buff *skb);
+};
+
+#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
+
+static void dev_gso_skb_destructor(struct sk_buff *skb)
+{
+	struct dev_gso_cb *cb;
+
+	do {
+		struct sk_buff *nskb = skb->next;
+
+		skb->next = nskb->next;
+		nskb->next = NULL;
+		kfree_skb(nskb);
+	} while (skb->next);
+
+	cb = DEV_GSO_CB(skb);
+	if (cb->destructor)
+		cb->destructor(skb);
+}
+
+/**
+ *	dev_gso_segment - Perform emulated hardware segmentation on skb.
+ *	@skb: buffer to segment
+ *
+ *	This function segments the given skb and stores the list of segments
+ *	in skb->next.
+ */
+static int dev_gso_segment(struct sk_buff *skb)
+{
+	struct net_device *dev = skb->dev;
+	struct sk_buff *segs;
+	int features = dev->features & ~(illegal_highdma(dev, skb) ?
+					 NETIF_F_SG : 0);
+
+	segs = skb_gso_segment(skb, features);
+
+	/* Verifying header integrity only. */
+	if (!segs)
+		return 0;
+
+	if (unlikely(IS_ERR(segs)))
+		return PTR_ERR(segs);
+
+	skb->next = segs;
+	DEV_GSO_CB(skb)->destructor = skb->destructor;
+	skb->destructor = dev_gso_skb_destructor;
+
+	return 0;
+}
+
+int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	if (likely(!skb->next)) {
+		if (netdev_nit)
+			dev_queue_xmit_nit(skb, dev);
+
+		if (netif_needs_gso(dev, skb)) {
+			if (unlikely(dev_gso_segment(skb)))
+				goto out_kfree_skb;
+			if (skb->next)
+				goto gso;
+		}
+
+		return dev->hard_start_xmit(skb, dev);
+	}
+
+gso:
+	do {
+		struct sk_buff *nskb = skb->next;
+		int rc;
+
+		skb->next = nskb->next;
+		nskb->next = NULL;
+		rc = dev->hard_start_xmit(nskb, dev);
+		if (unlikely(rc)) {
+			nskb->next = skb->next;
+			skb->next = nskb;
+			return rc;
+		}
+		if (unlikely(netif_queue_stopped(dev) && skb->next))
+			return NETDEV_TX_BUSY;
+	} while (skb->next);
+	
+	skb->destructor = DEV_GSO_CB(skb)->destructor;
+
+out_kfree_skb:
+	kfree_skb(skb);
 	return 0;
 }
 
 #define HARD_TX_LOCK(dev, cpu) {			\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		spin_lock(&dev->xmit_lock);		\
-		dev->xmit_lock_owner = cpu;		\
+		netif_tx_lock(dev);			\
 	}						\
 }
 
 #define HARD_TX_UNLOCK(dev) {				\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		dev->xmit_lock_owner = -1;		\
-		spin_unlock(&dev->xmit_lock);		\
+		netif_tx_unlock(dev);			\
 	}						\
 }
 
@@ -1289,9 +1362,19 @@ int dev_queue_xmit(struct sk_buff *skb)
 	struct Qdisc *q;
 	int rc = -ENOMEM;
 
+ 	/* If a checksum-deferred packet is forwarded to a device that needs a
+ 	 * checksum, correct the pointers and force checksumming.
+ 	 */
+ 	if (skb_checksum_setup(skb))
+ 		goto out_kfree_skb;
+
+	/* GSO will handle the following emulations directly. */
+	if (netif_needs_gso(dev, skb))
+		goto gso;
+
 	if (skb_shinfo(skb)->frag_list &&
 	    !(dev->features & NETIF_F_FRAGLIST) &&
-	    __skb_linearize(skb, GFP_ATOMIC))
+	    __skb_linearize(skb))
 		goto out_kfree_skb;
 
 	/* Fragmented skb is linearized if device does not support SG,
@@ -1300,31 +1383,26 @@ int dev_queue_xmit(struct sk_buff *skb)
 	 */
 	if (skb_shinfo(skb)->nr_frags &&
 	    (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb))
&&
-	    __skb_linearize(skb, GFP_ATOMIC))
+	    __skb_linearize(skb))
 		goto out_kfree_skb;
 
- 	/* If a checksum-deferred packet is forwarded to a device that needs a
- 	 * checksum, correct the pointers and force checksumming.
- 	 */
- 	if(skb_checksum_setup(skb))
- 		goto out_kfree_skb;
-  
 	/* If packet is not checksummed and device does not support
 	 * checksumming for this protocol, complete checksumming here.
 	 */
 	if (skb->ip_summed == CHECKSUM_HW &&
-	    (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) &&
+	    (!(dev->features & NETIF_F_GEN_CSUM) &&
 	     (!(dev->features & NETIF_F_IP_CSUM) ||
 	      skb->protocol != htons(ETH_P_IP))))
 	      	if (skb_checksum_help(skb, 0))
 	      		goto out_kfree_skb;
 
+gso:
 	spin_lock_prefetch(&dev->queue_lock);
 
 	/* Disable soft irqs for various locks below. Also 
 	 * stops preemption for RCU. 
 	 */
-	local_bh_disable(); 
+	rcu_read_lock_bh(); 
 
 	/* Updates of qdisc are serialized by queue_lock. 
 	 * The struct Qdisc which is pointed to by qdisc is now a 
@@ -1358,8 +1436,8 @@ int dev_queue_xmit(struct sk_buff *skb)
 	/* The device has no queue. Common case for software devices:
 	   loopback, all the sorts of tunnels...
 
-	   Really, it is unlikely that xmit_lock protection is necessary here.
-	   (f.e. loopback and IP tunnels are clean ignoring statistics
+	   Really, it is unlikely that netif_tx_lock protection is necessary
+	   here.  (f.e. loopback and IP tunnels are clean ignoring statistics
 	   counters.)
 	   However, it is possible, that they rely on protection
 	   made by us here.
@@ -1375,11 +1453,8 @@ int dev_queue_xmit(struct sk_buff *skb)
 			HARD_TX_LOCK(dev, cpu);
 
 			if (!netif_queue_stopped(dev)) {
-				if (netdev_nit)
-					dev_queue_xmit_nit(skb, dev);
-
 				rc = 0;
-				if (!dev->hard_start_xmit(skb, dev)) {
+				if (!dev_hard_start_xmit(skb, dev)) {
 					HARD_TX_UNLOCK(dev);
 					goto out;
 				}
@@ -1398,13 +1473,13 @@ int dev_queue_xmit(struct sk_buff *skb)
 	}
 
 	rc = -ENETDOWN;
-	local_bh_enable();
+	rcu_read_unlock_bh();
 
 out_kfree_skb:
 	kfree_skb(skb);
 	return rc;
 out:
-	local_bh_enable();
+	rcu_read_unlock_bh();
 	return rc;
 }
 
@@ -2732,7 +2807,7 @@ int register_netdevice(struct net_device
 	BUG_ON(dev->reg_state != NETREG_UNINITIALIZED);
 
 	spin_lock_init(&dev->queue_lock);
-	spin_lock_init(&dev->xmit_lock);
+	spin_lock_init(&dev->_xmit_lock);
 	dev->xmit_lock_owner = -1;
 #ifdef CONFIG_NET_CLS_ACT
 	spin_lock_init(&dev->ingress_lock);
@@ -2776,9 +2851,7 @@ int register_netdevice(struct net_device
 
 	/* Fix illegal SG+CSUM combinations. */
 	if ((dev->features & NETIF_F_SG) &&
-	    !(dev->features & (NETIF_F_IP_CSUM |
-			       NETIF_F_NO_CSUM |
-			       NETIF_F_HW_CSUM))) {
+	    !(dev->features & NETIF_F_ALL_CSUM)) {
 		printk("%s: Dropping NETIF_F_SG since no checksum feature.\n",
 		       dev->name);
 		dev->features &= ~NETIF_F_SG;
@@ -3330,7 +3403,6 @@ EXPORT_SYMBOL(__dev_get_by_index);
 EXPORT_SYMBOL(__dev_get_by_index);
 EXPORT_SYMBOL(__dev_get_by_name);
 EXPORT_SYMBOL(__dev_remove_pack);
-EXPORT_SYMBOL(__skb_linearize);
 EXPORT_SYMBOL(dev_valid_name);
 EXPORT_SYMBOL(dev_add_pack);
 EXPORT_SYMBOL(dev_alloc_name);
diff -r 1da8f53ce65b -r 6913e0756b81 linux-2.6-xen-sparse/net/core/skbuff.c
--- a/linux-2.6-xen-sparse/net/core/skbuff.c	Tue Jun 27 18:24:08 2006 +0100
+++ b/linux-2.6-xen-sparse/net/core/skbuff.c	Wed Jun 28 13:44:18 2006 +1000
@@ -165,9 +165,9 @@ struct sk_buff *__alloc_skb(unsigned int
 	shinfo = skb_shinfo(skb);
 	atomic_set(&shinfo->dataref, 1);
 	shinfo->nr_frags  = 0;
-	shinfo->tso_size = 0;
-	shinfo->tso_segs = 0;
-	shinfo->ufo_size = 0;
+	shinfo->gso_size = 0;
+	shinfo->gso_segs = 0;
+	shinfo->gso_type = 0;
 	shinfo->ip6_frag_id = 0;
 	shinfo->frag_list = NULL;
 
@@ -237,9 +237,9 @@ struct sk_buff *alloc_skb_from_cache(kme
 	shinfo = skb_shinfo(skb);
 	atomic_set(&shinfo->dataref, 1);
 	shinfo->nr_frags  = 0;
-	shinfo->tso_size = 0;
-	shinfo->tso_segs = 0;
-	shinfo->ufo_size = 0;
+	shinfo->gso_size = 0;
+	shinfo->gso_segs = 0;
+	shinfo->gso_type = 0;
 	shinfo->ip6_frag_id = 0;
 	shinfo->frag_list = NULL;
 
@@ -524,8 +524,9 @@ static void copy_skb_header(struct sk_bu
 	new->tc_index	= old->tc_index;
 #endif
 	atomic_set(&new->users, 1);
-	skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size;
-	skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs;
+	skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
+	skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
+	skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
 }
 
 /**
@@ -1799,6 +1800,133 @@ int skb_append_datato_frags(struct sock 
 
 	return 0;
 }
+
+/**
+ *	skb_segment - Perform protocol segmentation on skb.
+ *	@skb: buffer to segment
+ *	@features: features for the output path (see dev->features)
+ *
+ *	This function performs segmentation on the given skb.  It returns
+ *	the segment at the given position.  It returns NULL if there are
+ *	no more segments to generate, or when an error is encountered.
+ */
+struct sk_buff *skb_segment(struct sk_buff *skb, int features)
+{
+	struct sk_buff *segs = NULL;
+	struct sk_buff *tail = NULL;
+	unsigned int mss = skb_shinfo(skb)->gso_size;
+	unsigned int doffset = skb->data - skb->mac.raw;
+	unsigned int offset = doffset;
+	unsigned int headroom;
+	unsigned int len;
+	int sg = features & NETIF_F_SG;
+	int nfrags = skb_shinfo(skb)->nr_frags;
+	int err = -ENOMEM;
+	int i = 0;
+	int pos;
+
+	__skb_push(skb, doffset);
+	headroom = skb_headroom(skb);
+	pos = skb_headlen(skb);
+
+	do {
+		struct sk_buff *nskb;
+		skb_frag_t *frag;
+		int hsize, nsize;
+		int k;
+		int size;
+
+		len = skb->len - offset;
+		if (len > mss)
+			len = mss;
+
+		hsize = skb_headlen(skb) - offset;
+		if (hsize < 0)
+			hsize = 0;
+		nsize = hsize + doffset;
+		if (nsize > len + doffset || !sg)
+			nsize = len + doffset;
+
+		nskb = alloc_skb(nsize + headroom, GFP_ATOMIC);
+		if (unlikely(!nskb))
+			goto err;
+
+		if (segs)
+			tail->next = nskb;
+		else
+			segs = nskb;
+		tail = nskb;
+
+		nskb->dev = skb->dev;
+		nskb->priority = skb->priority;
+		nskb->protocol = skb->protocol;
+		nskb->dst = dst_clone(skb->dst);
+		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+		nskb->pkt_type = skb->pkt_type;
+		nskb->mac_len = skb->mac_len;
+
+		skb_reserve(nskb, headroom);
+		nskb->mac.raw = nskb->data;
+		nskb->nh.raw = nskb->data + skb->mac_len;
+		nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw);
+		memcpy(skb_put(nskb, doffset), skb->data, doffset);
+
+		if (!sg) {
+			nskb->csum = skb_copy_and_csum_bits(skb, offset,
+							    skb_put(nskb, len),
+							    len, 0);
+			continue;
+		}
+
+		frag = skb_shinfo(nskb)->frags;
+		k = 0;
+
+		nskb->ip_summed = CHECKSUM_HW;
+		nskb->csum = skb->csum;
+		memcpy(skb_put(nskb, hsize), skb->data + offset, hsize);
+
+		while (pos < offset + len) {
+			BUG_ON(i >= nfrags);
+
+			*frag = skb_shinfo(skb)->frags[i];
+			get_page(frag->page);
+			size = frag->size;
+
+			if (pos < offset) {
+				frag->page_offset += offset - pos;
+				frag->size -= offset - pos;
+			}
+
+			k++;
+
+			if (pos + size <= offset + len) {
+				i++;
+				pos += size;
+			} else {
+				frag->size -= pos + size - (offset + len);
+				break;
+			}
+
+			frag++;
+		}
+
+		skb_shinfo(nskb)->nr_frags = k;
+		nskb->data_len = len - hsize;
+		nskb->len += nskb->data_len;
+		nskb->truesize += nskb->data_len;
+	} while ((offset += len) < skb->len);
+
+	return segs;
+
+err:
+	while ((skb = segs)) {
+		segs = skb->next;
+		kfree(skb);
+	}
+	return ERR_PTR(err);
+}
+
+EXPORT_SYMBOL_GPL(skb_segment);
 
 void __init skb_init(void)
 {
diff -r 1da8f53ce65b -r 6913e0756b81 patches/linux-2.6.16.13/net-gso.patch
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/patches/linux-2.6.16.13/net-gso.patch	Wed Jun 28 13:44:18 2006 +1000
@@ -0,0 +1,2907 @@
+diff --git a/Documentation/networking/netdevices.txt
b/Documentation/networking/netdevices.txt
+index 3c0a5ba..847cedb 100644
+--- a/Documentation/networking/netdevices.txt
++++ b/Documentation/networking/netdevices.txt
+@@ -42,9 +42,9 @@ dev->get_stats:
+ 	Context: nominally process, but don''t sleep inside an rwlock
+ 
+ dev->hard_start_xmit:
+-	Synchronization: dev->xmit_lock spinlock.
++	Synchronization: netif_tx_lock spinlock.
+ 	When the driver sets NETIF_F_LLTX in dev->features this will be
+-	called without holding xmit_lock. In this case the driver 
++	called without holding netif_tx_lock. In this case the driver
+ 	has to lock by itself when needed. It is recommended to use a try lock
+ 	for this and return -1 when the spin lock fails. 
+ 	The locking there should also properly protect against 
+@@ -62,12 +62,12 @@ dev->hard_start_xmit:
+ 	  Only valid when NETIF_F_LLTX is set.
+ 
+ dev->tx_timeout:
+-	Synchronization: dev->xmit_lock spinlock.
++	Synchronization: netif_tx_lock spinlock.
+ 	Context: BHs disabled
+ 	Notes: netif_queue_stopped() is guaranteed true
+ 
+ dev->set_multicast_list:
+-	Synchronization: dev->xmit_lock spinlock.
++	Synchronization: netif_tx_lock spinlock.
+ 	Context: BHs disabled
+ 
+ dev->poll:
+diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
+index 4be9769..2e7cac7 100644
+--- a/drivers/block/aoe/aoenet.c
++++ b/drivers/block/aoe/aoenet.c
+@@ -95,9 +95,8 @@ mac_addr(char addr[6])
+ static struct sk_buff *
+ skb_check(struct sk_buff *skb)
+ {
+-	if (skb_is_nonlinear(skb))
+ 	if ((skb = skb_share_check(skb, GFP_ATOMIC)))
+-	if (skb_linearize(skb, GFP_ATOMIC) < 0) {
++	if (skb_linearize(skb)) {
+ 		dev_kfree_skb(skb);
+ 		return NULL;
+ 	}
+diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+index a2408d7..c90e620 100644
+--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
++++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+@@ -821,7 +821,8 @@ void ipoib_mcast_restart_task(void *dev_
+ 
+ 	ipoib_mcast_stop_thread(dev, 0);
+ 
+-	spin_lock_irqsave(&dev->xmit_lock, flags);
++	local_irq_save(flags);
++	netif_tx_lock(dev);
+ 	spin_lock(&priv->lock);
+ 
+ 	/*
+@@ -896,7 +897,8 @@ void ipoib_mcast_restart_task(void *dev_
+ 	}
+ 
+ 	spin_unlock(&priv->lock);
+-	spin_unlock_irqrestore(&dev->xmit_lock, flags);
++	netif_tx_unlock(dev);
++	local_irq_restore(flags);
+ 
+ 	/* We have to cancel outside of the spinlock */
+ 	list_for_each_entry_safe(mcast, tmcast, &remove_list, list) {
+diff --git a/drivers/media/dvb/dvb-core/dvb_net.c
b/drivers/media/dvb/dvb-core/dvb_net.c
+index 6711eb6..8d2351f 100644
+--- a/drivers/media/dvb/dvb-core/dvb_net.c
++++ b/drivers/media/dvb/dvb-core/dvb_net.c
+@@ -1052,7 +1052,7 @@ static void wq_set_multicast_list (void 
+ 
+ 	dvb_net_feed_stop(dev);
+ 	priv->rx_mode = RX_MODE_UNI;
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 
+ 	if (dev->flags & IFF_PROMISC) {
+ 		dprintk("%s: promiscuous mode\n", dev->name);
+@@ -1077,7 +1077,7 @@ static void wq_set_multicast_list (void 
+ 		}
+ 	}
+ 
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	dvb_net_feed_start(dev);
+ }
+ 
+diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
+index dd41049..6615583 100644
+--- a/drivers/net/8139cp.c
++++ b/drivers/net/8139cp.c
+@@ -794,7 +794,7 @@ #endif
+ 	entry = cp->tx_head;
+ 	eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0;
+ 	if (dev->features & NETIF_F_TSO)
+-		mss = skb_shinfo(skb)->tso_size;
++		mss = skb_shinfo(skb)->gso_size;
+ 
+ 	if (skb_shinfo(skb)->nr_frags == 0) {
+ 		struct cp_desc *txd = &cp->tx_ring[entry];
+diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
+index a24200d..b5e39a1 100644
+--- a/drivers/net/bnx2.c
++++ b/drivers/net/bnx2.c
+@@ -1593,7 +1593,7 @@ bnx2_tx_int(struct bnx2 *bp)
+ 		skb = tx_buf->skb;
+ #ifdef BCM_TSO 
+ 		/* partial BD completions possible with TSO packets */
+-		if (skb_shinfo(skb)->tso_size) {
++		if (skb_shinfo(skb)->gso_size) {
+ 			u16 last_idx, last_ring_idx;
+ 
+ 			last_idx = sw_cons +
+@@ -1948,7 +1948,7 @@ bnx2_poll(struct net_device *dev, int *b
+ 	return 1;
+ }
+ 
+-/* Called with rtnl_lock from vlan functions and also dev->xmit_lock
++/* Called with rtnl_lock from vlan functions and also netif_tx_lock
+  * from set_multicast.
+  */
+ static void
+@@ -4403,7 +4403,7 @@ bnx2_vlan_rx_kill_vid(struct net_device 
+ }
+ #endif
+ 
+-/* Called with dev->xmit_lock.
++/* Called with netif_tx_lock.
+  * hard_start_xmit is pseudo-lockless - a lock is only required when
+  * the tx queue is full. This way, we get the benefit of lockless
+  * operations most of the time without the complexities to handle
+@@ -4441,7 +4441,7 @@ bnx2_start_xmit(struct sk_buff *skb, str
+ 			(TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16));
+ 	}
+ #ifdef BCM_TSO 
+-	if ((mss = skb_shinfo(skb)->tso_size) &&
++	if ((mss = skb_shinfo(skb)->gso_size) &&
+ 		(skb->len > (bp->dev->mtu + ETH_HLEN))) {
+ 		u32 tcp_opt_len, ip_tcp_len;
+ 
+diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
+index bcf9f17..e970921 100644
+--- a/drivers/net/bonding/bond_main.c
++++ b/drivers/net/bonding/bond_main.c
+@@ -1145,8 +1145,7 @@ int bond_sethwaddr(struct net_device *bo
+ }
+ 
+ #define BOND_INTERSECT_FEATURES \
+-	(NETIF_F_SG|NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM|\
+-	NETIF_F_TSO|NETIF_F_UFO)
++	(NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO)
+ 
+ /* 
+  * Compute the common dev->feature set available to all slaves.  Some
+@@ -1164,9 +1163,7 @@ static int bond_compute_features(struct 
+ 		features &= (slave->dev->features & BOND_INTERSECT_FEATURES);
+ 
+ 	if ((features & NETIF_F_SG) && 
+-	    !(features & (NETIF_F_IP_CSUM |
+-			  NETIF_F_NO_CSUM |
+-			  NETIF_F_HW_CSUM)))
++	    !(features & NETIF_F_ALL_CSUM))
+ 		features &= ~NETIF_F_SG;
+ 
+ 	/* 
+@@ -4147,7 +4144,7 @@ static int bond_init(struct net_device *
+ 	 */
+ 	bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
+ 
+-	/* don''t acquire bond device''s xmit_lock when 
++	/* don''t acquire bond device''s netif_tx_lock when
+ 	 * transmitting */
+ 	bond_dev->features |= NETIF_F_LLTX;
+ 
+diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
+index 30ff8ea..7b7d360 100644
+--- a/drivers/net/chelsio/sge.c
++++ b/drivers/net/chelsio/sge.c
+@@ -1419,7 +1419,7 @@ int t1_start_xmit(struct sk_buff *skb, s
+ 	struct cpl_tx_pkt *cpl;
+ 
+ #ifdef NETIF_F_TSO
+-	if (skb_shinfo(skb)->tso_size) {
++	if (skb_shinfo(skb)->gso_size) {
+ 		int eth_type;
+ 		struct cpl_tx_pkt_lso *hdr;
+ 
+@@ -1434,7 +1434,7 @@ #ifdef NETIF_F_TSO
+ 		hdr->ip_hdr_words = skb->nh.iph->ihl;
+ 		hdr->tcp_hdr_words = skb->h.th->doff;
+ 		hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type,
+-						skb_shinfo(skb)->tso_size));
++						skb_shinfo(skb)->gso_size));
+ 		hdr->len = htonl(skb->len - sizeof(*hdr));
+ 		cpl = (struct cpl_tx_pkt *)hdr;
+ 		sge->stats.tx_lso_pkts++;
+diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
+index fa29402..681d284 100644
+--- a/drivers/net/e1000/e1000_main.c
++++ b/drivers/net/e1000/e1000_main.c
+@@ -2526,7 +2526,7 @@ #ifdef NETIF_F_TSO
+ 	uint8_t ipcss, ipcso, tucss, tucso, hdr_len;
+ 	int err;
+ 
+-	if (skb_shinfo(skb)->tso_size) {
++	if (skb_shinfo(skb)->gso_size) {
+ 		if (skb_header_cloned(skb)) {
+ 			err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+ 			if (err)
+@@ -2534,7 +2534,7 @@ #ifdef NETIF_F_TSO
+ 		}
+ 
+ 		hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff <<
2));
+-		mss = skb_shinfo(skb)->tso_size;
++		mss = skb_shinfo(skb)->gso_size;
+ 		if (skb->protocol == ntohs(ETH_P_IP)) {
+ 			skb->nh.iph->tot_len = 0;
+ 			skb->nh.iph->check = 0;
+@@ -2651,7 +2651,7 @@ #ifdef NETIF_F_TSO
+ 		 * tso gets written back prematurely before the data is fully
+ 		 * DMAd to the controller */
+ 		if (!skb->data_len && tx_ring->last_tx_tso &&
+-				!skb_shinfo(skb)->tso_size) {
++				!skb_shinfo(skb)->gso_size) {
+ 			tx_ring->last_tx_tso = 0;
+ 			size -= 4;
+ 		}
+@@ -2893,7 +2893,7 @@ #endif
+ 	}
+ 
+ #ifdef NETIF_F_TSO
+-	mss = skb_shinfo(skb)->tso_size;
++	mss = skb_shinfo(skb)->gso_size;
+ 	/* The controller does a simple calculation to 
+ 	 * make sure there is enough room in the FIFO before
+ 	 * initiating the DMA for each buffer.  The calc is:
+@@ -2935,7 +2935,7 @@ #endif
+ #ifdef NETIF_F_TSO
+ 	/* Controller Erratum workaround */
+ 	if (!skb->data_len && tx_ring->last_tx_tso &&
+-		!skb_shinfo(skb)->tso_size)
++		!skb_shinfo(skb)->gso_size)
+ 		count++;
+ #endif
+ 
+diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
+index 3682ec6..c35f16e 100644
+--- a/drivers/net/forcedeth.c
++++ b/drivers/net/forcedeth.c
+@@ -482,9 +482,9 @@ #define LPA_1000HALF	0x0400
+  * critical parts:
+  * - rx is (pseudo-) lockless: it relies on the single-threading provided
+  *	by the arch code for interrupts.
+- * - tx setup is lockless: it relies on dev->xmit_lock. Actual submission
++ * - tx setup is lockless: it relies on netif_tx_lock. Actual submission
+  *	needs dev->priv->lock :-(
+- * - set_multicast_list: preparation lockless, relies on dev->xmit_lock.
++ * - set_multicast_list: preparation lockless, relies on netif_tx_lock.
+  */
+ 
+ /* in dev: base, irq */
+@@ -1016,7 +1016,7 @@ static void drain_ring(struct net_device
+ 
+ /*
+  * nv_start_xmit: dev->hard_start_xmit function
+- * Called with dev->xmit_lock held.
++ * Called with netif_tx_lock held.
+  */
+ static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
+ {
+@@ -1105,8 +1105,8 @@ static int nv_start_xmit(struct sk_buff 
+ 	np->tx_skbuff[nr] = skb;
+ 
+ #ifdef NETIF_F_TSO
+-	if (skb_shinfo(skb)->tso_size)
+-		tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->tso_size <<
NV_TX2_TSO_SHIFT);
++	if (skb_shinfo(skb)->gso_size)
++		tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->gso_size <<
NV_TX2_TSO_SHIFT);
+ 	else
+ #endif
+ 	tx_flags_extra = (skb->ip_summed == CHECKSUM_HW ?
(NV_TX2_CHECKSUM_L3|NV_TX2_CHECKSUM_L4) : 0);
+@@ -1203,7 +1203,7 @@ static void nv_tx_done(struct net_device
+ 
+ /*
+  * nv_tx_timeout: dev->tx_timeout function
+- * Called with dev->xmit_lock held.
++ * Called with netif_tx_lock held.
+  */
+ static void nv_tx_timeout(struct net_device *dev)
+ {
+@@ -1524,7 +1524,7 @@ static int nv_change_mtu(struct net_devi
+ 		 * Changing the MTU is a rare event, it shouldn''t matter.
+ 		 */
+ 		disable_irq(dev->irq);
+-		spin_lock_bh(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		spin_lock(&np->lock);
+ 		/* stop engines */
+ 		nv_stop_rx(dev);
+@@ -1559,7 +1559,7 @@ static int nv_change_mtu(struct net_devi
+ 		nv_start_rx(dev);
+ 		nv_start_tx(dev);
+ 		spin_unlock(&np->lock);
+-		spin_unlock_bh(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 		enable_irq(dev->irq);
+ 	}
+ 	return 0;
+@@ -1594,7 +1594,7 @@ static int nv_set_mac_address(struct net
+ 	memcpy(dev->dev_addr, macaddr->sa_data, ETH_ALEN);
+ 
+ 	if (netif_running(dev)) {
+-		spin_lock_bh(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		spin_lock_irq(&np->lock);
+ 
+ 		/* stop rx engine */
+@@ -1606,7 +1606,7 @@ static int nv_set_mac_address(struct net
+ 		/* restart rx engine */
+ 		nv_start_rx(dev);
+ 		spin_unlock_irq(&np->lock);
+-		spin_unlock_bh(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 	} else {
+ 		nv_copy_mac_to_hw(dev);
+ 	}
+@@ -1615,7 +1615,7 @@ static int nv_set_mac_address(struct net
+ 
+ /*
+  * nv_set_multicast: dev->set_multicast function
+- * Called with dev->xmit_lock held.
++ * Called with netif_tx_lock held.
+  */
+ static void nv_set_multicast(struct net_device *dev)
+ {
+diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c
+index 102c1f0..d12605f 100644
+--- a/drivers/net/hamradio/6pack.c
++++ b/drivers/net/hamradio/6pack.c
+@@ -308,9 +308,9 @@ static int sp_set_mac_address(struct net
+ {
+ 	struct sockaddr_ax25 *sa = addr;
+ 
+-	spin_lock_irq(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN);
+-	spin_unlock_irq(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 
+ 	return 0;
+ }
+@@ -767,9 +767,9 @@ static int sixpack_ioctl(struct tty_stru
+ 			break;
+ 		}
+ 
+-		spin_lock_irq(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		memcpy(dev->dev_addr, &addr, AX25_ADDR_LEN);
+-		spin_unlock_irq(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 
+ 		err = 0;
+ 		break;
+diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
+index dc5e9d5..5c66f5a 100644
+--- a/drivers/net/hamradio/mkiss.c
++++ b/drivers/net/hamradio/mkiss.c
+@@ -357,9 +357,9 @@ static int ax_set_mac_address(struct net
+ {
+ 	struct sockaddr_ax25 *sa = addr;
+ 
+-	spin_lock_irq(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN);
+-	spin_unlock_irq(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 
+ 	return 0;
+ }
+@@ -886,9 +886,9 @@ static int mkiss_ioctl(struct tty_struct
+ 			break;
+ 		}
+ 
+-		spin_lock_irq(&dev->xmit_lock);
++		netif_tx_lock_bh(dev);
+ 		memcpy(dev->dev_addr, addr, AX25_ADDR_LEN);
+-		spin_unlock_irq(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 
+ 		err = 0;
+ 		break;
+diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
+index 31fb2d7..2e222ef 100644
+--- a/drivers/net/ifb.c
++++ b/drivers/net/ifb.c
+@@ -76,13 +76,13 @@ static void ri_tasklet(unsigned long dev
+ 	dp->st_task_enter++;
+ 	if ((skb = skb_peek(&dp->tq)) == NULL) {
+ 		dp->st_txq_refl_try++;
+-		if (spin_trylock(&_dev->xmit_lock)) {
++		if (netif_tx_trylock(_dev)) {
+ 			dp->st_rxq_enter++;
+ 			while ((skb = skb_dequeue(&dp->rq)) != NULL) {
+ 				skb_queue_tail(&dp->tq, skb);
+ 				dp->st_rx2tx_tran++;
+ 			}
+-			spin_unlock(&_dev->xmit_lock);
++			netif_tx_unlock(_dev);
+ 		} else {
+ 			/* reschedule */
+ 			dp->st_rxq_notenter++;
+@@ -110,7 +110,7 @@ static void ri_tasklet(unsigned long dev
+ 		}
+ 	}
+ 
+-	if (spin_trylock(&_dev->xmit_lock)) {
++	if (netif_tx_trylock(_dev)) {
+ 		dp->st_rxq_check++;
+ 		if ((skb = skb_peek(&dp->rq)) == NULL) {
+ 			dp->tasklet_pending = 0;
+@@ -118,10 +118,10 @@ static void ri_tasklet(unsigned long dev
+ 				netif_wake_queue(_dev);
+ 		} else {
+ 			dp->st_rxq_rsch++;
+-			spin_unlock(&_dev->xmit_lock);
++			netif_tx_unlock(_dev);
+ 			goto resched;
+ 		}
+-		spin_unlock(&_dev->xmit_lock);
++		netif_tx_unlock(_dev);
+ 	} else {
+ resched:
+ 		dp->tasklet_pending = 1;
+diff --git a/drivers/net/irda/vlsi_ir.c b/drivers/net/irda/vlsi_ir.c
+index a9f49f0..339d4a7 100644
+--- a/drivers/net/irda/vlsi_ir.c
++++ b/drivers/net/irda/vlsi_ir.c
+@@ -959,7 +959,7 @@ static int vlsi_hard_start_xmit(struct s
+ 			    ||  (now.tv_sec==ready.tv_sec &&
now.tv_usec>=ready.tv_usec))
+ 			    	break;
+ 			udelay(100);
+-			/* must not sleep here - we are called under xmit_lock! */
++			/* must not sleep here - called under netif_tx_lock! */
+ 		}
+ 	}
+ 
+diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
+index f9f77e4..bdab369 100644
+--- a/drivers/net/ixgb/ixgb_main.c
++++ b/drivers/net/ixgb/ixgb_main.c
+@@ -1163,7 +1163,7 @@ #ifdef NETIF_F_TSO
+ 	uint16_t ipcse, tucse, mss;
+ 	int err;
+ 
+-	if(likely(skb_shinfo(skb)->tso_size)) {
++	if(likely(skb_shinfo(skb)->gso_size)) {
+ 		if (skb_header_cloned(skb)) {
+ 			err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+ 			if (err)
+@@ -1171,7 +1171,7 @@ #ifdef NETIF_F_TSO
+ 		}
+ 
+ 		hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff <<
2));
+-		mss = skb_shinfo(skb)->tso_size;
++		mss = skb_shinfo(skb)->gso_size;
+ 		skb->nh.iph->tot_len = 0;
+ 		skb->nh.iph->check = 0;
+ 		skb->h.th->check = ~csum_tcpudp_magic(skb->nh.iph->saddr,
+diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
+index 690a1aa..9bcaa80 100644
+--- a/drivers/net/loopback.c
++++ b/drivers/net/loopback.c
+@@ -74,7 +74,7 @@ static void emulate_large_send_offload(s
+ 	struct iphdr *iph = skb->nh.iph;
+ 	struct tcphdr *th = (struct tcphdr*)(skb->nh.raw + (iph->ihl * 4));
+ 	unsigned int doffset = (iph->ihl + th->doff) * 4;
+-	unsigned int mtu = skb_shinfo(skb)->tso_size + doffset;
++	unsigned int mtu = skb_shinfo(skb)->gso_size + doffset;
+ 	unsigned int offset = 0;
+ 	u32 seq = ntohl(th->seq);
+ 	u16 id  = ntohs(iph->id);
+@@ -139,7 +139,7 @@ #ifndef LOOPBACK_MUST_CHECKSUM
+ #endif
+ 
+ #ifdef LOOPBACK_TSO
+-	if (skb_shinfo(skb)->tso_size) {
++	if (skb_shinfo(skb)->gso_size) {
+ 		BUG_ON(skb->protocol != htons(ETH_P_IP));
+ 		BUG_ON(skb->nh.iph->protocol != IPPROTO_TCP);
+ 
+diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
+index c0998ef..0fac9d5 100644
+--- a/drivers/net/mv643xx_eth.c
++++ b/drivers/net/mv643xx_eth.c
+@@ -1107,7 +1107,7 @@ static int mv643xx_eth_start_xmit(struct
+ 
+ #ifdef MV643XX_CHECKSUM_OFFLOAD_TX
+ 	if (has_tiny_unaligned_frags(skb)) {
+-		if ((skb_linearize(skb, GFP_ATOMIC) != 0)) {
++		if (__skb_linearize(skb)) {
+ 			stats->tx_dropped++;
+ 			printk(KERN_DEBUG "%s: failed to linearize tiny "
+ 					"unaligned fragment\n", dev->name);
+diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c
+index 9d6d254..c9ed624 100644
+--- a/drivers/net/natsemi.c
++++ b/drivers/net/natsemi.c
+@@ -323,12 +323,12 @@ performance critical codepaths:
+ The rx process only runs in the interrupt handler. Access from outside
+ the interrupt handler is only permitted after disable_irq().
+ 
+-The rx process usually runs under the dev->xmit_lock. If
np->intr_tx_reap
++The rx process usually runs under the netif_tx_lock. If np->intr_tx_reap
+ is set, then access is permitted under spin_lock_irq(&np->lock).
+ 
+ Thus configuration functions that want to access everything must call
+ 	disable_irq(dev->irq);
+-	spin_lock_bh(dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	spin_lock_irq(&np->lock);
+ 
+ IV. Notes
+diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
+index 8cc0d0b..e53b313 100644
+--- a/drivers/net/r8169.c
++++ b/drivers/net/r8169.c
+@@ -2171,7 +2171,7 @@ static int rtl8169_xmit_frags(struct rtl
+ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device
*dev)
+ {
+ 	if (dev->features & NETIF_F_TSO) {
+-		u32 mss = skb_shinfo(skb)->tso_size;
++		u32 mss = skb_shinfo(skb)->gso_size;
+ 
+ 		if (mss)
+ 			return LargeSend | ((mss & MSSMask) << MSSShift);
+diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
+index b7f00d6..439f45f 100644
+--- a/drivers/net/s2io.c
++++ b/drivers/net/s2io.c
+@@ -3522,8 +3522,8 @@ #endif
+ 	txdp->Control_1 = 0;
+ 	txdp->Control_2 = 0;
+ #ifdef NETIF_F_TSO
+-	mss = skb_shinfo(skb)->tso_size;
+-	if (mss) {
++	mss = skb_shinfo(skb)->gso_size;
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4) {
+ 		txdp->Control_1 |= TXD_TCP_LSO_EN;
+ 		txdp->Control_1 |= TXD_TCP_LSO_MSS(mss);
+ 	}
+@@ -3543,10 +3543,10 @@ #endif
+ 	}
+ 
+ 	frg_len = skb->len - skb->data_len;
+-	if (skb_shinfo(skb)->ufo_size) {
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) {
+ 		int ufo_size;
+ 
+-		ufo_size = skb_shinfo(skb)->ufo_size;
++		ufo_size = skb_shinfo(skb)->gso_size;
+ 		ufo_size &= ~7;
+ 		txdp->Control_1 |= TXD_UFO_EN;
+ 		txdp->Control_1 |= TXD_UFO_MSS(ufo_size);
+@@ -3572,7 +3572,7 @@ #endif
+ 	txdp->Host_Control = (unsigned long) skb;
+ 	txdp->Control_1 |= TXD_BUFFER0_SIZE(frg_len);
+ 
+-	if (skb_shinfo(skb)->ufo_size)
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 		txdp->Control_1 |= TXD_UFO_EN;
+ 
+ 	frg_cnt = skb_shinfo(skb)->nr_frags;
+@@ -3587,12 +3587,12 @@ #endif
+ 		    (sp->pdev, frag->page, frag->page_offset,
+ 		     frag->size, PCI_DMA_TODEVICE);
+ 		txdp->Control_1 = TXD_BUFFER0_SIZE(frag->size);
+-		if (skb_shinfo(skb)->ufo_size)
++		if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 			txdp->Control_1 |= TXD_UFO_EN;
+ 	}
+ 	txdp->Control_1 |= TXD_GATHER_CODE_LAST;
+ 
+-	if (skb_shinfo(skb)->ufo_size)
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 		frg_cnt++; /* as Txd0 was used for inband header */
+ 
+ 	tx_fifo = mac_control->tx_FIFO_start[queue];
+@@ -3606,7 +3606,7 @@ #ifdef NETIF_F_TSO
+ 	if (mss)
+ 		val64 |= TX_FIFO_SPECIAL_FUNC;
+ #endif
+-	if (skb_shinfo(skb)->ufo_size)
++	if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4)
+ 		val64 |= TX_FIFO_SPECIAL_FUNC;
+ 	writeq(val64, &tx_fifo->List_Control);
+ 
+diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
+index 0618cd5..2a55eb3 100644
+--- a/drivers/net/sky2.c
++++ b/drivers/net/sky2.c
+@@ -1125,7 +1125,7 @@ static unsigned tx_le_req(const struct s
+ 	count = sizeof(dma_addr_t) / sizeof(u32);
+ 	count += skb_shinfo(skb)->nr_frags * count;
+ 
+-	if (skb_shinfo(skb)->tso_size)
++	if (skb_shinfo(skb)->gso_size)
+ 		++count;
+ 
+ 	if (skb->ip_summed == CHECKSUM_HW)
+@@ -1197,7 +1197,7 @@ static int sky2_xmit_frame(struct sk_buf
+ 	}
+ 
+ 	/* Check for TCP Segmentation Offload */
+-	mss = skb_shinfo(skb)->tso_size;
++	mss = skb_shinfo(skb)->gso_size;
+ 	if (mss != 0) {
+ 		/* just drop the packet if non-linear expansion fails */
+ 		if (skb_header_cloned(skb) &&
+diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
+index caf4102..fc9164a 100644
+--- a/drivers/net/tg3.c
++++ b/drivers/net/tg3.c
+@@ -3664,7 +3664,7 @@ static int tg3_start_xmit(struct sk_buff
+ #if TG3_TSO_SUPPORT != 0
+ 	mss = 0;
+ 	if (skb->len > (tp->dev->mtu + ETH_HLEN) &&
+-	    (mss = skb_shinfo(skb)->tso_size) != 0) {
++	    (mss = skb_shinfo(skb)->gso_size) != 0) {
+ 		int tcp_opt_len, ip_tcp_len;
+ 
+ 		if (skb_header_cloned(skb) &&
+diff --git a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c
+index 5b1af39..11de5af 100644
+--- a/drivers/net/tulip/winbond-840.c
++++ b/drivers/net/tulip/winbond-840.c
+@@ -1605,11 +1605,11 @@ #ifdef CONFIG_PM
+  * - get_stats:
+  * 	spin_lock_irq(np->lock), doesn''t touch hw if not present
+  * - hard_start_xmit:
+- * 	netif_stop_queue + spin_unlock_wait(&dev->xmit_lock);
++ * 	synchronize_irq + netif_tx_disable;
+  * - tx_timeout:
+- * 	netif_device_detach + spin_unlock_wait(&dev->xmit_lock);
++ * 	netif_device_detach + netif_tx_disable;
+  * - set_multicast_list
+- * 	netif_device_detach + spin_unlock_wait(&dev->xmit_lock);
++ * 	netif_device_detach + netif_tx_disable;
+  * - interrupt handler
+  * 	doesn''t touch hw if not present, synchronize_irq waits for
+  * 	running instances of the interrupt handler.
+@@ -1635,11 +1635,10 @@ static int w840_suspend (struct pci_dev 
+ 		netif_device_detach(dev);
+ 		update_csr6(dev, 0);
+ 		iowrite32(0, ioaddr + IntrEnable);
+-		netif_stop_queue(dev);
+ 		spin_unlock_irq(&np->lock);
+ 
+-		spin_unlock_wait(&dev->xmit_lock);
+ 		synchronize_irq(dev->irq);
++		netif_tx_disable(dev);
+ 	
+ 		np->stats.rx_missed_errors += ioread32(ioaddr + RxMissed) & 0xffff;
+ 
+diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c
+index 4c76cb7..30c48c9 100644
+--- a/drivers/net/typhoon.c
++++ b/drivers/net/typhoon.c
+@@ -340,7 +340,7 @@ #define typhoon_synchronize_irq(x) synch
+ #endif
+ 
+ #if defined(NETIF_F_TSO)
+-#define skb_tso_size(x)		(skb_shinfo(x)->tso_size)
++#define skb_tso_size(x)		(skb_shinfo(x)->gso_size)
+ #define TSO_NUM_DESCRIPTORS	2
+ #define TSO_OFFLOAD_ON		TYPHOON_OFFLOAD_TCP_SEGMENT
+ #else
+diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
+index ed1f837..2eb6b5f 100644
+--- a/drivers/net/via-velocity.c
++++ b/drivers/net/via-velocity.c
+@@ -1899,6 +1899,13 @@ static int velocity_xmit(struct sk_buff 
+ 
+ 	int pktlen = skb->len;
+ 
++#ifdef VELOCITY_ZERO_COPY_SUPPORT
++	if (skb_shinfo(skb)->nr_frags > 6 && __skb_linearize(skb)) {
++		kfree_skb(skb);
++		return 0;
++	}
++#endif
++
+ 	spin_lock_irqsave(&vptr->lock, flags);
+ 
+ 	index = vptr->td_curr[qnum];
+@@ -1914,8 +1921,6 @@ static int velocity_xmit(struct sk_buff 
+ 	 */
+ 	if (pktlen < ETH_ZLEN) {
+ 		/* Cannot occur until ZC support */
+-		if(skb_linearize(skb, GFP_ATOMIC))
+-			return 0; 
+ 		pktlen = ETH_ZLEN;
+ 		memcpy(tdinfo->buf, skb->data, skb->len);
+ 		memset(tdinfo->buf + skb->len, 0, ETH_ZLEN - skb->len);
+@@ -1933,7 +1938,6 @@ #ifdef VELOCITY_ZERO_COPY_SUPPORT
+ 		int nfrags = skb_shinfo(skb)->nr_frags;
+ 		tdinfo->skb = skb;
+ 		if (nfrags > 6) {
+-			skb_linearize(skb, GFP_ATOMIC);
+ 			memcpy(tdinfo->buf, skb->data, skb->len);
+ 			tdinfo->skb_dma[0] = tdinfo->buf_dma;
+ 			td_ptr->tdesc0.pktsize = 
+diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c
+index 6fd0bf7..75237c1 100644
+--- a/drivers/net/wireless/orinoco.c
++++ b/drivers/net/wireless/orinoco.c
+@@ -1835,7 +1835,9 @@ static int __orinoco_program_rids(struct
+ 	/* Set promiscuity / multicast*/
+ 	priv->promiscuous = 0;
+ 	priv->mc_count = 0;
+-	__orinoco_set_multicast_list(dev); /* FIXME: what about the xmit_lock */
++
++	/* FIXME: what about netif_tx_lock */
++	__orinoco_set_multicast_list(dev);
+ 
+ 	return 0;
+ }
+diff --git a/drivers/s390/net/qeth_eddp.c b/drivers/s390/net/qeth_eddp.c
+index 82cb4af..57cec40 100644
+--- a/drivers/s390/net/qeth_eddp.c
++++ b/drivers/s390/net/qeth_eddp.c
+@@ -421,7 +421,7 @@ #endif /* CONFIG_QETH_VLAN */
+        }
+ 	tcph = eddp->skb->h.th;
+ 	while (eddp->skb_offset < eddp->skb->len) {
+-		data_len = min((int)skb_shinfo(eddp->skb)->tso_size,
++		data_len = min((int)skb_shinfo(eddp->skb)->gso_size,
+ 			       (int)(eddp->skb->len - eddp->skb_offset));
+ 		/* prepare qdio hdr */
+ 		if (eddp->qh.hdr.l2.id == QETH_HEADER_TYPE_LAYER2){
+@@ -516,20 +516,20 @@ qeth_eddp_calc_num_pages(struct qeth_edd
+ 	
+ 	QETH_DBF_TEXT(trace, 5, "eddpcanp");
+ 	/* can we put multiple skbs in one page? */
+-	skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->tso_size + hdr_len);
++	skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->gso_size + hdr_len);
+ 	if (skbs_per_page > 1){
+-		ctx->num_pages = (skb_shinfo(skb)->tso_segs + 1) /
++		ctx->num_pages = (skb_shinfo(skb)->gso_segs + 1) /
+ 				 skbs_per_page + 1;
+ 		ctx->elements_per_skb = 1;
+ 	} else {
+ 		/* no -> how many elements per skb? */
+-		ctx->elements_per_skb = (skb_shinfo(skb)->tso_size + hdr_len +
++		ctx->elements_per_skb = (skb_shinfo(skb)->gso_size + hdr_len +
+ 				     PAGE_SIZE) >> PAGE_SHIFT;
+ 		ctx->num_pages = ctx->elements_per_skb *
+-				 (skb_shinfo(skb)->tso_segs + 1);
++				 (skb_shinfo(skb)->gso_segs + 1);
+ 	}
+ 	ctx->num_elements = ctx->elements_per_skb *
+-			    (skb_shinfo(skb)->tso_segs + 1);
++			    (skb_shinfo(skb)->gso_segs + 1);
+ }
+ 
+ static inline struct qeth_eddp_context *
+diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c
+index dba7f7f..d9cc997 100644
+--- a/drivers/s390/net/qeth_main.c
++++ b/drivers/s390/net/qeth_main.c
+@@ -4454,7 +4454,7 @@ qeth_send_packet(struct qeth_card *card,
+ 	queue = card->qdio.out_qs
+ 		[qeth_get_priority_queue(card, skb, ipv, cast_type)];
+ 
+-	if (skb_shinfo(skb)->tso_size)
++	if (skb_shinfo(skb)->gso_size)
+ 		large_send = card->options.large_send;
+ 
+ 	/*are we able to do TSO ? If so ,prepare and send it from here */
+@@ -4501,7 +4501,7 @@ qeth_send_packet(struct qeth_card *card,
+ 		card->stats.tx_packets++;
+ 		card->stats.tx_bytes += skb->len;
+ #ifdef CONFIG_QETH_PERF_STATS
+-		if (skb_shinfo(skb)->tso_size &&
++		if (skb_shinfo(skb)->gso_size &&
+ 		   !(large_send == QETH_LARGE_SEND_NO)) {
+ 			card->perf_stats.large_send_bytes += skb->len;
+ 			card->perf_stats.large_send_cnt++;
+diff --git a/drivers/s390/net/qeth_tso.h b/drivers/s390/net/qeth_tso.h
+index 1286dde..89cbf34 100644
+--- a/drivers/s390/net/qeth_tso.h
++++ b/drivers/s390/net/qeth_tso.h
+@@ -51,7 +51,7 @@ qeth_tso_fill_header(struct qeth_card *c
+ 	hdr->ext.hdr_version = 1;
+ 	hdr->ext.hdr_len     = 28;
+ 	/*insert non-fix values */
+-	hdr->ext.mss = skb_shinfo(skb)->tso_size;
++	hdr->ext.mss = skb_shinfo(skb)->gso_size;
+ 	hdr->ext.dg_hdr_len = (__u16)(iph->ihl*4 + tcph->doff*4);
+ 	hdr->ext.payload_len = (__u16)(skb->len - hdr->ext.dg_hdr_len -
+ 				       sizeof(struct qeth_hdr_tso));
+diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
+index 93535f0..9269df7 100644
+--- a/include/linux/ethtool.h
++++ b/include/linux/ethtool.h
+@@ -408,6 +408,8 @@ #define ETHTOOL_STSO		0x0000001f /* Set 
+ #define ETHTOOL_GPERMADDR	0x00000020 /* Get permanent hardware address */
+ #define ETHTOOL_GUFO		0x00000021 /* Get UFO enable (ethtool_value) */
+ #define ETHTOOL_SUFO		0x00000022 /* Set UFO enable (ethtool_value) */
++#define ETHTOOL_GGSO		0x00000023 /* Get GSO enable (ethtool_value) */
++#define ETHTOOL_SGSO		0x00000024 /* Set GSO enable (ethtool_value) */
+ 
+ /* compatibility with older code */
+ #define SPARC_ETH_GSET		ETHTOOL_GSET
+diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
+index 7fda03d..47b0965 100644
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -230,7 +230,8 @@ enum netdev_state_t
+ 	__LINK_STATE_SCHED,
+ 	__LINK_STATE_NOCARRIER,
+ 	__LINK_STATE_RX_SCHED,
+-	__LINK_STATE_LINKWATCH_PENDING
++	__LINK_STATE_LINKWATCH_PENDING,
++	__LINK_STATE_QDISC_RUNNING,
+ };
+ 
+ 
+@@ -306,9 +307,17 @@ #define NETIF_F_HW_VLAN_TX	128	/* Transm
+ #define NETIF_F_HW_VLAN_RX	256	/* Receive VLAN hw acceleration */
+ #define NETIF_F_HW_VLAN_FILTER	512	/* Receive filtering on VLAN */
+ #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
+-#define NETIF_F_TSO		2048	/* Can offload TCP/IP segmentation */
++#define NETIF_F_GSO		2048	/* Enable software GSO. */
+ #define NETIF_F_LLTX		4096	/* LockLess TX */
+-#define NETIF_F_UFO             8192    /* Can offload UDP Large Send*/
++
++	/* Segmentation offload features */
++#define NETIF_F_GSO_SHIFT	16
++#define NETIF_F_TSO		(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
++#define NETIF_F_UFO		(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
++#define NETIF_F_GSO_ROBUST	(SKB_GSO_DODGY << NETIF_F_GSO_SHIFT)
++
++#define NETIF_F_GEN_CSUM	(NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
++#define NETIF_F_ALL_CSUM	(NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
+ 
+ 	struct net_device	*next_sched;
+ 
+@@ -394,6 +403,9 @@ #define NETIF_F_UFO             8192    
+ 	struct list_head	qdisc_list;
+ 	unsigned long		tx_queue_len;	/* Max frames per queue allowed */
+ 
++	/* Partially transmitted GSO packet. */
++	struct sk_buff		*gso_skb;
++
+ 	/* ingress path synchronizer */
+ 	spinlock_t		ingress_lock;
+ 	struct Qdisc		*qdisc_ingress;
+@@ -402,7 +414,7 @@ #define NETIF_F_UFO             8192    
+  * One part is mostly used on xmit path (device)
+  */
+ 	/* hard_start_xmit synchronizer */
+-	spinlock_t		xmit_lock ____cacheline_aligned_in_smp;
++	spinlock_t		_xmit_lock ____cacheline_aligned_in_smp;
+ 	/* cpu id of processor entered to hard_start_xmit or -1,
+ 	   if nobody entered there.
+ 	 */
+@@ -527,6 +539,8 @@ struct packet_type {
+ 					 struct net_device *,
+ 					 struct packet_type *,
+ 					 struct net_device *);
++	struct sk_buff		*(*gso_segment)(struct sk_buff *skb,
++						int features);
+ 	void			*af_packet_priv;
+ 	struct list_head	list;
+ };
+@@ -693,7 +707,8 @@ extern int		dev_change_name(struct net_d
+ extern int		dev_set_mtu(struct net_device *, int);
+ extern int		dev_set_mac_address(struct net_device *,
+ 					    struct sockaddr *);
+-extern void		dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
++extern int		dev_hard_start_xmit(struct sk_buff *skb,
++					    struct net_device *dev);
+ 
+ extern void		dev_init(void);
+ 
+@@ -900,11 +915,43 @@ static inline void __netif_rx_complete(s
+ 	clear_bit(__LINK_STATE_RX_SCHED, &dev->state);
+ }
+ 
++static inline void netif_tx_lock(struct net_device *dev)
++{
++	spin_lock(&dev->_xmit_lock);
++	dev->xmit_lock_owner = smp_processor_id();
++}
++
++static inline void netif_tx_lock_bh(struct net_device *dev)
++{
++	spin_lock_bh(&dev->_xmit_lock);
++	dev->xmit_lock_owner = smp_processor_id();
++}
++
++static inline int netif_tx_trylock(struct net_device *dev)
++{
++	int err = spin_trylock(&dev->_xmit_lock);
++	if (!err)
++		dev->xmit_lock_owner = smp_processor_id();
++	return err;
++}
++
++static inline void netif_tx_unlock(struct net_device *dev)
++{
++	dev->xmit_lock_owner = -1;
++	spin_unlock(&dev->_xmit_lock);
++}
++
++static inline void netif_tx_unlock_bh(struct net_device *dev)
++{
++	dev->xmit_lock_owner = -1;
++	spin_unlock_bh(&dev->_xmit_lock);
++}
++
+ static inline void netif_tx_disable(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	netif_stop_queue(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ /* These functions live elsewhere (drivers/net/net_init.c, but related) */
+@@ -932,6 +979,7 @@ extern int		netdev_max_backlog;
+ extern int		weight_p;
+ extern int		netdev_set_master(struct net_device *dev, struct net_device
*master);
+ extern int skb_checksum_help(struct sk_buff *skb, int inward);
++extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features);
+ #ifdef CONFIG_BUG
+ extern void netdev_rx_csum_fault(struct net_device *dev);
+ #else
+@@ -951,6 +999,18 @@ #endif
+ 
+ extern void linkwatch_run_queue(void);
+ 
++static inline int skb_gso_ok(struct sk_buff *skb, int features)
++{
++	int feature = skb_shinfo(skb)->gso_size ?
++		      skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT : 0;
++	return (features & feature) == feature;
++}
++
++static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb)
++{
++	return !skb_gso_ok(skb, dev->features);
++}
++
+ #endif /* __KERNEL__ */
+ 
+ #endif	/* _LINUX_DEV_H */
+diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
+index ad7cc22..b19d45d 100644
+--- a/include/linux/skbuff.h
++++ b/include/linux/skbuff.h
+@@ -134,9 +134,10 @@ struct skb_frag_struct {
+ struct skb_shared_info {
+ 	atomic_t	dataref;
+ 	unsigned short	nr_frags;
+-	unsigned short	tso_size;
+-	unsigned short	tso_segs;
+-	unsigned short  ufo_size;
++	unsigned short	gso_size;
++	/* Warning: this field is not always filled in (UFO)! */
++	unsigned short	gso_segs;
++	unsigned short  gso_type;
+ 	unsigned int    ip6_frag_id;
+ 	struct sk_buff	*frag_list;
+ 	skb_frag_t	frags[MAX_SKB_FRAGS];
+@@ -168,6 +169,14 @@ enum {
+ 	SKB_FCLONE_CLONE,
+ };
+ 
++enum {
++	SKB_GSO_TCPV4 = 1 << 0,
++	SKB_GSO_UDPV4 = 1 << 1,
++
++	/* This indicates the skb is from an untrusted source. */
++	SKB_GSO_DODGY = 1 << 2,
++};
++
+ /** 
+  *	struct sk_buff - socket buffer
+  *	@next: Next buffer in list
+@@ -1148,18 +1157,34 @@ static inline int skb_can_coalesce(struc
+ 	return 0;
+ }
+ 
++static inline int __skb_linearize(struct sk_buff *skb)
++{
++	return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM;
++}
++
+ /**
+  *	skb_linearize - convert paged skb to linear one
+  *	@skb: buffer to linarize
+- *	@gfp: allocation mode
+  *
+  *	If there is no free memory -ENOMEM is returned, otherwise zero
+  *	is returned and the old skb data released.
+  */
+-extern int __skb_linearize(struct sk_buff *skb, gfp_t gfp);
+-static inline int skb_linearize(struct sk_buff *skb, gfp_t gfp)
++static inline int skb_linearize(struct sk_buff *skb)
++{
++	return skb_is_nonlinear(skb) ? __skb_linearize(skb) : 0;
++}
++
++/**
++ *	skb_linearize_cow - make sure skb is linear and writable
++ *	@skb: buffer to process
++ *
++ *	If there is no free memory -ENOMEM is returned, otherwise zero
++ *	is returned and the old skb data released.
++ */
++static inline int skb_linearize_cow(struct sk_buff *skb)
+ {
+-	return __skb_linearize(skb, gfp);
++	return skb_is_nonlinear(skb) || skb_cloned(skb) ?
++	       __skb_linearize(skb) : 0;
+ }
+ 
+ /**
+@@ -1254,6 +1279,7 @@ extern void	       skb_split(struct sk_b
+ 				 struct sk_buff *skb1, const u32 len);
+ 
+ extern void	       skb_release_data(struct sk_buff *skb);
++extern struct sk_buff *skb_segment(struct sk_buff *skb, int features);
+ 
+ static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
+ 				       int len, void *buffer)
+diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
+index b94d1ad..75b5b93 100644
+--- a/include/net/pkt_sched.h
++++ b/include/net/pkt_sched.h
+@@ -218,12 +218,13 @@ extern struct qdisc_rate_table *qdisc_ge
+ 		struct rtattr *tab);
+ extern void qdisc_put_rtab(struct qdisc_rate_table *tab);
+ 
+-extern int qdisc_restart(struct net_device *dev);
++extern void __qdisc_run(struct net_device *dev);
+ 
+ static inline void qdisc_run(struct net_device *dev)
+ {
+-	while (!netif_queue_stopped(dev) && qdisc_restart(dev) < 0)
+-		/* NOTHING */;
++	if (!netif_queue_stopped(dev) &&
++	    !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
++		__qdisc_run(dev);
+ }
+ 
+ extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp,
+diff --git a/include/net/protocol.h b/include/net/protocol.h
+index 6dc5970..0d2dcdb 100644
+--- a/include/net/protocol.h
++++ b/include/net/protocol.h
+@@ -37,6 +37,8 @@ #define MAX_INET_PROTOS	256		/* Must be 
+ struct net_protocol {
+ 	int			(*handler)(struct sk_buff *skb);
+ 	void			(*err_handler)(struct sk_buff *skb, u32 info);
++	struct sk_buff	       *(*gso_segment)(struct sk_buff *skb,
++					       int features);
+ 	int			no_policy;
+ };
+ 
+diff --git a/include/net/sock.h b/include/net/sock.h
+index f63d0d5..a8e8d21 100644
+--- a/include/net/sock.h
++++ b/include/net/sock.h
+@@ -1064,9 +1064,13 @@ static inline void sk_setup_caps(struct 
+ {
+ 	__sk_dst_set(sk, dst);
+ 	sk->sk_route_caps = dst->dev->features;
++	if (sk->sk_route_caps & NETIF_F_GSO)
++		sk->sk_route_caps |= NETIF_F_TSO;
+ 	if (sk->sk_route_caps & NETIF_F_TSO) {
+ 		if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
+ 			sk->sk_route_caps &= ~NETIF_F_TSO;
++		else 
++			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
+ 	}
+ }
+ 
+diff --git a/include/net/tcp.h b/include/net/tcp.h
+index 77f21c6..70e1d5f 100644
+--- a/include/net/tcp.h
++++ b/include/net/tcp.h
+@@ -552,13 +552,13 @@ #include <net/tcp_ecn.h>
+  */
+ static inline int tcp_skb_pcount(const struct sk_buff *skb)
+ {
+-	return skb_shinfo(skb)->tso_segs;
++	return skb_shinfo(skb)->gso_segs;
+ }
+ 
+ /* This is valid iff tcp_skb_pcount() > 1. */
+ static inline int tcp_skb_mss(const struct sk_buff *skb)
+ {
+-	return skb_shinfo(skb)->tso_size;
++	return skb_shinfo(skb)->gso_size;
+ }
+ 
+ static inline void tcp_dec_pcount_approx(__u32 *count,
+@@ -1063,6 +1063,8 @@ extern struct request_sock_ops tcp_reque
+ 
+ extern int tcp_v4_destroy_sock(struct sock *sk);
+ 
++extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features);
++
+ #ifdef CONFIG_PROC_FS
+ extern int  tcp4_proc_init(void);
+ extern void tcp4_proc_exit(void);
+diff --git a/net/atm/clip.c b/net/atm/clip.c
+index 1842a4e..6dc21a7 100644
+--- a/net/atm/clip.c
++++ b/net/atm/clip.c
+@@ -101,7 +101,7 @@ static void unlink_clip_vcc(struct clip_
+ 		printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc);
+ 		return;
+ 	}
+-	spin_lock_bh(&entry->neigh->dev->xmit_lock);	/* block
clip_start_xmit() */
++	netif_tx_lock_bh(entry->neigh->dev);	/* block clip_start_xmit() */
+ 	entry->neigh->used = jiffies;
+ 	for (walk = &entry->vccs; *walk; walk = &(*walk)->next)
+ 		if (*walk == clip_vcc) {
+@@ -125,7 +125,7 @@ static void unlink_clip_vcc(struct clip_
+ 	printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc "
+ 	  "0x%p)\n",entry,clip_vcc);
+ out:
+-	spin_unlock_bh(&entry->neigh->dev->xmit_lock);
++	netif_tx_unlock_bh(entry->neigh->dev);
+ }
+ 
+ /* The neighbour entry n->lock is held. */
+diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
+index 0b33a7b..180e79b 100644
+--- a/net/bridge/br_device.c
++++ b/net/bridge/br_device.c
+@@ -146,9 +146,9 @@ static int br_set_tx_csum(struct net_dev
+ 	struct net_bridge *br = netdev_priv(dev);
+ 
+ 	if (data)
+-		br->feature_mask |= NETIF_F_IP_CSUM;
++		br->feature_mask |= NETIF_F_NO_CSUM;
+ 	else
+-		br->feature_mask &= ~NETIF_F_IP_CSUM;
++		br->feature_mask &= ~NETIF_F_ALL_CSUM;
+ 
+ 	br_features_recompute(br);
+ 	return 0;
+@@ -185,6 +185,6 @@ void br_dev_setup(struct net_device *dev
+ 	dev->set_mac_address = br_set_mac_address;
+ 	dev->priv_flags = IFF_EBRIDGE;
+ 
+- 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST
+- 		| NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_IP_CSUM;
++ 	dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA |
++ 			NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST;
+ }
+diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
+index 2d24fb4..00b1128 100644
+--- a/net/bridge/br_forward.c
++++ b/net/bridge/br_forward.c
+@@ -32,7 +32,7 @@ static inline int should_deliver(const s
+ int br_dev_queue_push_xmit(struct sk_buff *skb)
+ {
+ 	/* drop mtu oversized packets except tso */
+-	if (skb->len > skb->dev->mtu &&
!skb_shinfo(skb)->tso_size)
++	if (skb->len > skb->dev->mtu &&
!skb_shinfo(skb)->gso_size)
+ 		kfree_skb(skb);
+ 	else {
+ #ifdef CONFIG_BRIDGE_NETFILTER
+diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
+index f36b35e..0617146 100644
+--- a/net/bridge/br_if.c
++++ b/net/bridge/br_if.c
+@@ -385,17 +385,28 @@ void br_features_recompute(struct net_br
+ 	struct net_bridge_port *p;
+ 	unsigned long features, checksum;
+ 
+-	features = br->feature_mask &~ NETIF_F_IP_CSUM;
+-	checksum = br->feature_mask & NETIF_F_IP_CSUM;
++	checksum = br->feature_mask & NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0;
++	features = br->feature_mask & ~NETIF_F_ALL_CSUM;
+ 
+ 	list_for_each_entry(p, &br->port_list, list) {
+-		if (!(p->dev->features 
+-		      & (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)))
++		unsigned long feature = p->dev->features;
++
++		if (checksum & NETIF_F_NO_CSUM && !(feature &
NETIF_F_NO_CSUM))
++			checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM;
++		if (checksum & NETIF_F_HW_CSUM && !(feature &
NETIF_F_HW_CSUM))
++			checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM;
++		if (!(feature & NETIF_F_IP_CSUM))
+ 			checksum = 0;
+-		features &= p->dev->features;
++
++		if (feature & NETIF_F_GSO)
++			feature |= NETIF_F_TSO;
++		feature |= NETIF_F_GSO;
++
++		features &= feature;
+ 	}
+ 
+-	br->dev->features = features | checksum | NETIF_F_LLTX;
++	br->dev->features = features | checksum | NETIF_F_LLTX |
++			    NETIF_F_GSO_ROBUST;
+ }
+ 
+ /* called with RTNL */
+diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
+index 9e27373..588207f 100644
+--- a/net/bridge/br_netfilter.c
++++ b/net/bridge/br_netfilter.c
+@@ -743,7 +743,7 @@ static int br_nf_dev_queue_xmit(struct s
+ {
+ 	if (skb->protocol == htons(ETH_P_IP) &&
+ 	    skb->len > skb->dev->mtu &&
+-	    !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
++	    !skb_shinfo(skb)->gso_size)
+ 		return ip_fragment(skb, br_dev_queue_push_xmit);
+ 	else
+ 		return br_dev_queue_push_xmit(skb);
+diff --git a/net/core/dev.c b/net/core/dev.c
+index 12a214c..32e1056 100644
+--- a/net/core/dev.c
++++ b/net/core/dev.c
+@@ -115,6 +115,7 @@ #include <linux/wireless.h>		/* Note : w
+ #include <net/iw_handler.h>
+ #endif	/* CONFIG_NET_RADIO */
+ #include <asm/current.h>
++#include <linux/err.h>
+ 
+ /*
+  *	The list of packet types we will receive (as opposed to discard)
+@@ -1032,7 +1033,7 @@ static inline void net_timestamp(struct 
+  *	taps currently in use.
+  */
+ 
+-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
++static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
+ {
+ 	struct packet_type *ptype;
+ 
+@@ -1106,6 +1107,45 @@ out:	
+ 	return ret;
+ }
+ 
++/**
++ *	skb_gso_segment - Perform segmentation on skb.
++ *	@skb: buffer to segment
++ *	@features: features for the output path (see dev->features)
++ *
++ *	This function segments the given skb and returns a list of segments.
++ *
++ *	It may return NULL if the skb requires no segmentation.  This is
++ *	only possible when GSO is used for verifying header integrity.
++ */
++struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
++{
++	struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT);
++	struct packet_type *ptype;
++	int type = skb->protocol;
++
++	BUG_ON(skb_shinfo(skb)->frag_list);
++	BUG_ON(skb->ip_summed != CHECKSUM_HW);
++
++	skb->mac.raw = skb->data;
++	skb->mac_len = skb->nh.raw - skb->data;
++	__skb_pull(skb, skb->mac_len);
++
++	rcu_read_lock();
++	list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) {
++		if (ptype->type == type && !ptype->dev &&
ptype->gso_segment) {
++			segs = ptype->gso_segment(skb, features);
++			break;
++		}
++	}
++	rcu_read_unlock();
++
++	__skb_push(skb, skb->data - skb->mac.raw);
++
++	return segs;
++}
++
++EXPORT_SYMBOL(skb_gso_segment);
++
+ /* Take action when hardware reception checksum errors are detected. */
+ #ifdef CONFIG_BUG
+ void netdev_rx_csum_fault(struct net_device *dev)
+@@ -1142,75 +1182,108 @@ #else
+ #define illegal_highdma(dev, skb)	(0)
+ #endif
+ 
+-/* Keep head the same: replace data */
+-int __skb_linearize(struct sk_buff *skb, gfp_t gfp_mask)
+-{
+-	unsigned int size;
+-	u8 *data;
+-	long offset;
+-	struct skb_shared_info *ninfo;
+-	int headerlen = skb->data - skb->head;
+-	int expand = (skb->tail + skb->data_len) - skb->end;
+-
+-	if (skb_shared(skb))
+-		BUG();
+-
+-	if (expand <= 0)
+-		expand = 0;
+-
+-	size = skb->end - skb->head + expand;
+-	size = SKB_DATA_ALIGN(size);
+-	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
+-	if (!data)
+-		return -ENOMEM;
+-
+-	/* Copy entire thing */
+-	if (skb_copy_bits(skb, -headerlen, data, headerlen + skb->len))
+-		BUG();
+-
+-	/* Set up shinfo */
+-	ninfo = (struct skb_shared_info*)(data + size);
+-	atomic_set(&ninfo->dataref, 1);
+-	ninfo->tso_size = skb_shinfo(skb)->tso_size;
+-	ninfo->tso_segs = skb_shinfo(skb)->tso_segs;
+-	ninfo->nr_frags = 0;
+-	ninfo->frag_list = NULL;
+-
+-	/* Offset between the two in bytes */
+-	offset = data - skb->head;
+-
+-	/* Free old data. */
+-	skb_release_data(skb);
+-
+-	skb->head = data;
+-	skb->end  = data + size;
+-
+-	/* Set up new pointers */
+-	skb->h.raw   += offset;
+-	skb->nh.raw  += offset;
+-	skb->mac.raw += offset;
+-	skb->tail    += offset;
+-	skb->data    += offset;
+-
+-	/* We are no longer a clone, even if we were. */
+-	skb->cloned    = 0;
+-
+-	skb->tail     += skb->data_len;
+-	skb->data_len  = 0;
++struct dev_gso_cb {
++	void (*destructor)(struct sk_buff *skb);
++};
++
++#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
++
++static void dev_gso_skb_destructor(struct sk_buff *skb)
++{
++	struct dev_gso_cb *cb;
++
++	do {
++		struct sk_buff *nskb = skb->next;
++
++		skb->next = nskb->next;
++		nskb->next = NULL;
++		kfree_skb(nskb);
++	} while (skb->next);
++
++	cb = DEV_GSO_CB(skb);
++	if (cb->destructor)
++		cb->destructor(skb);
++}
++
++/**
++ *	dev_gso_segment - Perform emulated hardware segmentation on skb.
++ *	@skb: buffer to segment
++ *
++ *	This function segments the given skb and stores the list of segments
++ *	in skb->next.
++ */
++static int dev_gso_segment(struct sk_buff *skb)
++{
++	struct net_device *dev = skb->dev;
++	struct sk_buff *segs;
++	int features = dev->features & ~(illegal_highdma(dev, skb) ?
++					 NETIF_F_SG : 0);
++
++	segs = skb_gso_segment(skb, features);
++
++	/* Verifying header integrity only. */
++	if (!segs)
++		return 0;
++
++	if (unlikely(IS_ERR(segs)))
++		return PTR_ERR(segs);
++
++	skb->next = segs;
++	DEV_GSO_CB(skb)->destructor = skb->destructor;
++	skb->destructor = dev_gso_skb_destructor;
++
++	return 0;
++}
++
++int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev)
++{
++	if (likely(!skb->next)) {
++		if (netdev_nit)
++			dev_queue_xmit_nit(skb, dev);
++
++		if (netif_needs_gso(dev, skb)) {
++			if (unlikely(dev_gso_segment(skb)))
++				goto out_kfree_skb;
++			if (skb->next)
++				goto gso;
++		}
++
++		return dev->hard_start_xmit(skb, dev);
++	}
++
++gso:
++	do {
++		struct sk_buff *nskb = skb->next;
++		int rc;
++
++		skb->next = nskb->next;
++		nskb->next = NULL;
++		rc = dev->hard_start_xmit(nskb, dev);
++		if (unlikely(rc)) {
++			nskb->next = skb->next;
++			skb->next = nskb;
++			return rc;
++		}
++		if (unlikely(netif_queue_stopped(dev) && skb->next))
++			return NETDEV_TX_BUSY;
++	} while (skb->next);
++	
++	skb->destructor = DEV_GSO_CB(skb)->destructor;
++
++out_kfree_skb:
++	kfree_skb(skb);
+ 	return 0;
+ }
+ 
+ #define HARD_TX_LOCK(dev, cpu) {			\
+ 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
+-		spin_lock(&dev->xmit_lock);		\
+-		dev->xmit_lock_owner = cpu;		\
++		netif_tx_lock(dev);			\
+ 	}						\
+ }
+ 
+ #define HARD_TX_UNLOCK(dev) {				\
+ 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
+-		dev->xmit_lock_owner = -1;		\
+-		spin_unlock(&dev->xmit_lock);		\
++		netif_tx_unlock(dev);			\
+ 	}						\
+ }
+ 
+@@ -1246,9 +1319,13 @@ int dev_queue_xmit(struct sk_buff *skb)
+ 	struct Qdisc *q;
+ 	int rc = -ENOMEM;
+ 
++	/* GSO will handle the following emulations directly. */
++	if (netif_needs_gso(dev, skb))
++		goto gso;
++
+ 	if (skb_shinfo(skb)->frag_list &&
+ 	    !(dev->features & NETIF_F_FRAGLIST) &&
+-	    __skb_linearize(skb, GFP_ATOMIC))
++	    __skb_linearize(skb))
+ 		goto out_kfree_skb;
+ 
+ 	/* Fragmented skb is linearized if device does not support SG,
+@@ -1257,25 +1334,26 @@ int dev_queue_xmit(struct sk_buff *skb)
+ 	 */
+ 	if (skb_shinfo(skb)->nr_frags &&
+ 	    (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb))
&&
+-	    __skb_linearize(skb, GFP_ATOMIC))
++	    __skb_linearize(skb))
+ 		goto out_kfree_skb;
+ 
+ 	/* If packet is not checksummed and device does not support
+ 	 * checksumming for this protocol, complete checksumming here.
+ 	 */
+ 	if (skb->ip_summed == CHECKSUM_HW &&
+-	    (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) &&
++	    (!(dev->features & NETIF_F_GEN_CSUM) &&
+ 	     (!(dev->features & NETIF_F_IP_CSUM) ||
+ 	      skb->protocol != htons(ETH_P_IP))))
+ 	      	if (skb_checksum_help(skb, 0))
+ 	      		goto out_kfree_skb;
+ 
++gso:
+ 	spin_lock_prefetch(&dev->queue_lock);
+ 
+ 	/* Disable soft irqs for various locks below. Also 
+ 	 * stops preemption for RCU. 
+ 	 */
+-	local_bh_disable(); 
++	rcu_read_lock_bh(); 
+ 
+ 	/* Updates of qdisc are serialized by queue_lock. 
+ 	 * The struct Qdisc which is pointed to by qdisc is now a 
+@@ -1309,8 +1387,8 @@ #endif
+ 	/* The device has no queue. Common case for software devices:
+ 	   loopback, all the sorts of tunnels...
+ 
+-	   Really, it is unlikely that xmit_lock protection is necessary here.
+-	   (f.e. loopback and IP tunnels are clean ignoring statistics
++	   Really, it is unlikely that netif_tx_lock protection is necessary
++	   here.  (f.e. loopback and IP tunnels are clean ignoring statistics
+ 	   counters.)
+ 	   However, it is possible, that they rely on protection
+ 	   made by us here.
+@@ -1326,11 +1404,8 @@ #endif
+ 			HARD_TX_LOCK(dev, cpu);
+ 
+ 			if (!netif_queue_stopped(dev)) {
+-				if (netdev_nit)
+-					dev_queue_xmit_nit(skb, dev);
+-
+ 				rc = 0;
+-				if (!dev->hard_start_xmit(skb, dev)) {
++				if (!dev_hard_start_xmit(skb, dev)) {
+ 					HARD_TX_UNLOCK(dev);
+ 					goto out;
+ 				}
+@@ -1349,13 +1424,13 @@ #endif
+ 	}
+ 
+ 	rc = -ENETDOWN;
+-	local_bh_enable();
++	rcu_read_unlock_bh();
+ 
+ out_kfree_skb:
+ 	kfree_skb(skb);
+ 	return rc;
+ out:
+-	local_bh_enable();
++	rcu_read_unlock_bh();
+ 	return rc;
+ }
+ 
+@@ -2670,7 +2745,7 @@ int register_netdevice(struct net_device
+ 	BUG_ON(dev->reg_state != NETREG_UNINITIALIZED);
+ 
+ 	spin_lock_init(&dev->queue_lock);
+-	spin_lock_init(&dev->xmit_lock);
++	spin_lock_init(&dev->_xmit_lock);
+ 	dev->xmit_lock_owner = -1;
+ #ifdef CONFIG_NET_CLS_ACT
+ 	spin_lock_init(&dev->ingress_lock);
+@@ -2714,9 +2789,7 @@ #endif
+ 
+ 	/* Fix illegal SG+CSUM combinations. */
+ 	if ((dev->features & NETIF_F_SG) &&
+-	    !(dev->features & (NETIF_F_IP_CSUM |
+-			       NETIF_F_NO_CSUM |
+-			       NETIF_F_HW_CSUM))) {
++	    !(dev->features & NETIF_F_ALL_CSUM)) {
+ 		printk("%s: Dropping NETIF_F_SG since no checksum feature.\n",
+ 		       dev->name);
+ 		dev->features &= ~NETIF_F_SG;
+@@ -3268,7 +3341,6 @@ subsys_initcall(net_dev_init);
+ EXPORT_SYMBOL(__dev_get_by_index);
+ EXPORT_SYMBOL(__dev_get_by_name);
+ EXPORT_SYMBOL(__dev_remove_pack);
+-EXPORT_SYMBOL(__skb_linearize);
+ EXPORT_SYMBOL(dev_valid_name);
+ EXPORT_SYMBOL(dev_add_pack);
+ EXPORT_SYMBOL(dev_alloc_name);
+diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
+index 05d6085..c57d887 100644
+--- a/net/core/dev_mcast.c
++++ b/net/core/dev_mcast.c
+@@ -62,7 +62,7 @@ #include <net/arp.h>
+  *	Device mc lists are changed by bh at least if IPv6 is enabled,
+  *	so that it must be bh protected.
+  *
+- *	We block accesses to device mc filters with dev->xmit_lock.
++ *	We block accesses to device mc filters with netif_tx_lock.
+  */
+ 
+ /*
+@@ -93,9 +93,9 @@ static void __dev_mc_upload(struct net_d
+ 
+ void dev_mc_upload(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	__dev_mc_upload(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ /*
+@@ -107,7 +107,7 @@ int dev_mc_delete(struct net_device *dev
+ 	int err = 0;
+ 	struct dev_mc_list *dmi, **dmip;
+ 
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 
+ 	for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip =
&dmi->next) {
+ 		/*
+@@ -139,13 +139,13 @@ int dev_mc_delete(struct net_device *dev
+ 			 */
+ 			__dev_mc_upload(dev);
+ 			
+-			spin_unlock_bh(&dev->xmit_lock);
++			netif_tx_unlock_bh(dev);
+ 			return 0;
+ 		}
+ 	}
+ 	err = -ENOENT;
+ done:
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	return err;
+ }
+ 
+@@ -160,7 +160,7 @@ int dev_mc_add(struct net_device *dev, v
+ 
+ 	dmi1 = kmalloc(sizeof(*dmi), GFP_ATOMIC);
+ 
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) {
+ 		if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
+ 		    dmi->dmi_addrlen == alen) {
+@@ -176,7 +176,7 @@ int dev_mc_add(struct net_device *dev, v
+ 	}
+ 
+ 	if ((dmi = dmi1) == NULL) {
+-		spin_unlock_bh(&dev->xmit_lock);
++		netif_tx_unlock_bh(dev);
+ 		return -ENOMEM;
+ 	}
+ 	memcpy(dmi->dmi_addr, addr, alen);
+@@ -189,11 +189,11 @@ int dev_mc_add(struct net_device *dev, v
+ 
+ 	__dev_mc_upload(dev);
+ 	
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	return 0;
+ 
+ done:
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	kfree(dmi1);
+ 	return err;
+ }
+@@ -204,7 +204,7 @@ done:
+ 
+ void dev_mc_discard(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	
+ 	while (dev->mc_list != NULL) {
+ 		struct dev_mc_list *tmp = dev->mc_list;
+@@ -215,7 +215,7 @@ void dev_mc_discard(struct net_device *d
+ 	}
+ 	dev->mc_count = 0;
+ 
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ #ifdef CONFIG_PROC_FS
+@@ -250,7 +250,7 @@ static int dev_mc_seq_show(struct seq_fi
+ 	struct dev_mc_list *m;
+ 	struct net_device *dev = v;
+ 
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	for (m = dev->mc_list; m; m = m->next) {
+ 		int i;
+ 
+@@ -262,7 +262,7 @@ static int dev_mc_seq_show(struct seq_fi
+ 
+ 		seq_putc(seq, ''\n'');
+ 	}
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ 	return 0;
+ }
+ 
+diff --git a/net/core/ethtool.c b/net/core/ethtool.c
+index e6f7610..27ce168 100644
+--- a/net/core/ethtool.c
++++ b/net/core/ethtool.c
+@@ -30,7 +30,7 @@ u32 ethtool_op_get_link(struct net_devic
+ 
+ u32 ethtool_op_get_tx_csum(struct net_device *dev)
+ {
+-	return (dev->features & (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM)) != 0;
++	return (dev->features & NETIF_F_ALL_CSUM) != 0;
+ }
+ 
+ int ethtool_op_set_tx_csum(struct net_device *dev, u32 data)
+@@ -551,9 +551,7 @@ static int ethtool_set_sg(struct net_dev
+ 		return -EFAULT;
+ 
+ 	if (edata.data && 
+-	    !(dev->features & (NETIF_F_IP_CSUM |
+-			       NETIF_F_NO_CSUM |
+-			       NETIF_F_HW_CSUM)))
++	    !(dev->features & NETIF_F_ALL_CSUM))
+ 		return -EINVAL;
+ 
+ 	return __ethtool_set_sg(dev, edata.data);
+@@ -591,7 +589,7 @@ static int ethtool_set_tso(struct net_de
+ 
+ static int ethtool_get_ufo(struct net_device *dev, char __user *useraddr)
+ {
+-	struct ethtool_value edata = { ETHTOOL_GTSO };
++	struct ethtool_value edata = { ETHTOOL_GUFO };
+ 
+ 	if (!dev->ethtool_ops->get_ufo)
+ 		return -EOPNOTSUPP;
+@@ -600,6 +598,7 @@ static int ethtool_get_ufo(struct net_de
+ 		 return -EFAULT;
+ 	return 0;
+ }
++
+ static int ethtool_set_ufo(struct net_device *dev, char __user *useraddr)
+ {
+ 	struct ethtool_value edata;
+@@ -615,6 +614,29 @@ static int ethtool_set_ufo(struct net_de
+ 	return dev->ethtool_ops->set_ufo(dev, edata.data);
+ }
+ 
++static int ethtool_get_gso(struct net_device *dev, char __user *useraddr)
++{
++	struct ethtool_value edata = { ETHTOOL_GGSO };
++
++	edata.data = dev->features & NETIF_F_GSO;
++	if (copy_to_user(useraddr, &edata, sizeof(edata)))
++		 return -EFAULT;
++	return 0;
++}
++
++static int ethtool_set_gso(struct net_device *dev, char __user *useraddr)
++{
++	struct ethtool_value edata;
++
++	if (copy_from_user(&edata, useraddr, sizeof(edata)))
++		return -EFAULT;
++	if (edata.data)
++		dev->features |= NETIF_F_GSO;
++	else
++		dev->features &= ~NETIF_F_GSO;
++	return 0;
++}
++
+ static int ethtool_self_test(struct net_device *dev, char __user *useraddr)
+ {
+ 	struct ethtool_test test;
+@@ -906,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr)
+ 	case ETHTOOL_SUFO:
+ 		rc = ethtool_set_ufo(dev, useraddr);
+ 		break;
++	case ETHTOOL_GGSO:
++		rc = ethtool_get_gso(dev, useraddr);
++		break;
++	case ETHTOOL_SGSO:
++		rc = ethtool_set_gso(dev, useraddr);
++		break;
+ 	default:
+ 		rc =  -EOPNOTSUPP;
+ 	}
+diff --git a/net/core/netpoll.c b/net/core/netpoll.c
+index ea51f8d..ec28d3b 100644
+--- a/net/core/netpoll.c
++++ b/net/core/netpoll.c
+@@ -273,24 +273,21 @@ static void netpoll_send_skb(struct netp
+ 
+ 	do {
+ 		npinfo->tries--;
+-		spin_lock(&np->dev->xmit_lock);
+-		np->dev->xmit_lock_owner = smp_processor_id();
++		netif_tx_lock(np->dev);
+ 
+ 		/*
+ 		 * network drivers do not expect to be called if the queue is
+ 		 * stopped.
+ 		 */
+ 		if (netif_queue_stopped(np->dev)) {
+-			np->dev->xmit_lock_owner = -1;
+-			spin_unlock(&np->dev->xmit_lock);
++			netif_tx_unlock(np->dev);
+ 			netpoll_poll(np);
+ 			udelay(50);
+ 			continue;
+ 		}
+ 
+ 		status = np->dev->hard_start_xmit(skb, np->dev);
+-		np->dev->xmit_lock_owner = -1;
+-		spin_unlock(&np->dev->xmit_lock);
++		netif_tx_unlock(np->dev);
+ 
+ 		/* success */
+ 		if(!status) {
+diff --git a/net/core/pktgen.c b/net/core/pktgen.c
+index da16f8f..2380347 100644
+--- a/net/core/pktgen.c
++++ b/net/core/pktgen.c
+@@ -2582,7 +2582,7 @@ static __inline__ void pktgen_xmit(struc
+ 		}
+ 	}
+ 	
+-	spin_lock_bh(&odev->xmit_lock);
++	netif_tx_lock_bh(odev);
+ 	if (!netif_queue_stopped(odev)) {
+ 
+ 		atomic_inc(&(pkt_dev->skb->users));
+@@ -2627,7 +2627,7 @@ retry_now:
+ 		pkt_dev->next_tx_ns = 0;
+         }
+ 
+-	spin_unlock_bh(&odev->xmit_lock);
++	netif_tx_unlock_bh(odev);
+ 	
+ 	/* If pkt_dev->count is zero, then run forever */
+ 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >=
pkt_dev->count)) {
+diff --git a/net/core/skbuff.c b/net/core/skbuff.c
+index 2144952..46f56af 100644
+--- a/net/core/skbuff.c
++++ b/net/core/skbuff.c
+@@ -164,9 +164,9 @@ struct sk_buff *__alloc_skb(unsigned int
+ 	shinfo = skb_shinfo(skb);
+ 	atomic_set(&shinfo->dataref, 1);
+ 	shinfo->nr_frags  = 0;
+-	shinfo->tso_size = 0;
+-	shinfo->tso_segs = 0;
+-	shinfo->ufo_size = 0;
++	shinfo->gso_size = 0;
++	shinfo->gso_segs = 0;
++	shinfo->gso_type = 0;
+ 	shinfo->ip6_frag_id = 0;
+ 	shinfo->frag_list = NULL;
+ 
+@@ -230,8 +230,9 @@ struct sk_buff *alloc_skb_from_cache(kme
+ 
+ 	atomic_set(&(skb_shinfo(skb)->dataref), 1);
+ 	skb_shinfo(skb)->nr_frags  = 0;
+-	skb_shinfo(skb)->tso_size = 0;
+-	skb_shinfo(skb)->tso_segs = 0;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_segs = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 	skb_shinfo(skb)->frag_list = NULL;
+ out:
+ 	return skb;
+@@ -501,8 +502,9 @@ #endif
+ 	new->tc_index	= old->tc_index;
+ #endif
+ 	atomic_set(&new->users, 1);
+-	skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size;
+-	skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs;
++	skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
++	skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
++	skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
+ }
+ 
+ /**
+@@ -1777,6 +1779,133 @@ int skb_append_datato_frags(struct sock 
+ 	return 0;
+ }
+ 
++/**
++ *	skb_segment - Perform protocol segmentation on skb.
++ *	@skb: buffer to segment
++ *	@features: features for the output path (see dev->features)
++ *
++ *	This function performs segmentation on the given skb.  It returns
++ *	the segment at the given position.  It returns NULL if there are
++ *	no more segments to generate, or when an error is encountered.
++ */
++struct sk_buff *skb_segment(struct sk_buff *skb, int features)
++{
++	struct sk_buff *segs = NULL;
++	struct sk_buff *tail = NULL;
++	unsigned int mss = skb_shinfo(skb)->gso_size;
++	unsigned int doffset = skb->data - skb->mac.raw;
++	unsigned int offset = doffset;
++	unsigned int headroom;
++	unsigned int len;
++	int sg = features & NETIF_F_SG;
++	int nfrags = skb_shinfo(skb)->nr_frags;
++	int err = -ENOMEM;
++	int i = 0;
++	int pos;
++
++	__skb_push(skb, doffset);
++	headroom = skb_headroom(skb);
++	pos = skb_headlen(skb);
++
++	do {
++		struct sk_buff *nskb;
++		skb_frag_t *frag;
++		int hsize, nsize;
++		int k;
++		int size;
++
++		len = skb->len - offset;
++		if (len > mss)
++			len = mss;
++
++		hsize = skb_headlen(skb) - offset;
++		if (hsize < 0)
++			hsize = 0;
++		nsize = hsize + doffset;
++		if (nsize > len + doffset || !sg)
++			nsize = len + doffset;
++
++		nskb = alloc_skb(nsize + headroom, GFP_ATOMIC);
++		if (unlikely(!nskb))
++			goto err;
++
++		if (segs)
++			tail->next = nskb;
++		else
++			segs = nskb;
++		tail = nskb;
++
++		nskb->dev = skb->dev;
++		nskb->priority = skb->priority;
++		nskb->protocol = skb->protocol;
++		nskb->dst = dst_clone(skb->dst);
++		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
++		nskb->pkt_type = skb->pkt_type;
++		nskb->mac_len = skb->mac_len;
++
++		skb_reserve(nskb, headroom);
++		nskb->mac.raw = nskb->data;
++		nskb->nh.raw = nskb->data + skb->mac_len;
++		nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw);
++		memcpy(skb_put(nskb, doffset), skb->data, doffset);
++
++		if (!sg) {
++			nskb->csum = skb_copy_and_csum_bits(skb, offset,
++							    skb_put(nskb, len),
++							    len, 0);
++			continue;
++		}
++
++		frag = skb_shinfo(nskb)->frags;
++		k = 0;
++
++		nskb->ip_summed = CHECKSUM_HW;
++		nskb->csum = skb->csum;
++		memcpy(skb_put(nskb, hsize), skb->data + offset, hsize);
++
++		while (pos < offset + len) {
++			BUG_ON(i >= nfrags);
++
++			*frag = skb_shinfo(skb)->frags[i];
++			get_page(frag->page);
++			size = frag->size;
++
++			if (pos < offset) {
++				frag->page_offset += offset - pos;
++				frag->size -= offset - pos;
++			}
++
++			k++;
++
++			if (pos + size <= offset + len) {
++				i++;
++				pos += size;
++			} else {
++				frag->size -= pos + size - (offset + len);
++				break;
++			}
++
++			frag++;
++		}
++
++		skb_shinfo(nskb)->nr_frags = k;
++		nskb->data_len = len - hsize;
++		nskb->len += nskb->data_len;
++		nskb->truesize += nskb->data_len;
++	} while ((offset += len) < skb->len);
++
++	return segs;
++
++err:
++	while ((skb = segs)) {
++		segs = skb->next;
++		kfree(skb);
++	}
++	return ERR_PTR(err);
++}
++
++EXPORT_SYMBOL_GPL(skb_segment);
++
+ void __init skb_init(void)
+ {
+ 	skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
+diff --git a/net/decnet/dn_nsp_in.c b/net/decnet/dn_nsp_in.c
+index 44bda85..2e3323a 100644
+--- a/net/decnet/dn_nsp_in.c
++++ b/net/decnet/dn_nsp_in.c
+@@ -801,8 +801,7 @@ got_it:
+ 		 * We linearize everything except data segments here.
+ 		 */
+ 		if (cb->nsp_flags & ~0x60) {
+-			if (unlikely(skb_is_nonlinear(skb)) &&
+-			    skb_linearize(skb, GFP_ATOMIC) != 0)
++			if (unlikely(skb_linearize(skb)))
+ 				goto free_out;
+ 		}
+ 
+diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
+index 3407f19..a0a25e0 100644
+--- a/net/decnet/dn_route.c
++++ b/net/decnet/dn_route.c
+@@ -629,8 +629,7 @@ int dn_route_rcv(struct sk_buff *skb, st
+ 			padlen);
+ 
+         if (flags & DN_RT_PKT_CNTL) {
+-		if (unlikely(skb_is_nonlinear(skb)) &&
+-		    skb_linearize(skb, GFP_ATOMIC) != 0)
++		if (unlikely(skb_linearize(skb)))
+ 			goto dump_it;
+ 
+                 switch(flags & DN_RT_CNTL_MSK) {
+diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
+index 97c276f..5ba719e 100644
+--- a/net/ipv4/af_inet.c
++++ b/net/ipv4/af_inet.c
+@@ -68,6 +68,7 @@
+  */
+ 
+ #include <linux/config.h>
++#include <linux/err.h>
+ #include <linux/errno.h>
+ #include <linux/types.h>
+ #include <linux/socket.h>
+@@ -1084,6 +1085,54 @@ int inet_sk_rebuild_header(struct sock *
+ 
+ EXPORT_SYMBOL(inet_sk_rebuild_header);
+ 
++static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features)
++{
++	struct sk_buff *segs = ERR_PTR(-EINVAL);
++	struct iphdr *iph;
++	struct net_protocol *ops;
++	int proto;
++	int ihl;
++	int id;
++
++	if (!pskb_may_pull(skb, sizeof(*iph)))
++		goto out;
++
++	iph = skb->nh.iph;
++	ihl = iph->ihl * 4;
++	if (ihl < sizeof(*iph))
++		goto out;
++
++	if (!pskb_may_pull(skb, ihl))
++		goto out;
++
++	skb->h.raw = __skb_pull(skb, ihl);
++	iph = skb->nh.iph;
++	id = ntohs(iph->id);
++	proto = iph->protocol & (MAX_INET_PROTOS - 1);
++	segs = ERR_PTR(-EPROTONOSUPPORT);
++
++	rcu_read_lock();
++	ops = rcu_dereference(inet_protos[proto]);
++	if (ops && ops->gso_segment)
++		segs = ops->gso_segment(skb, features);
++	rcu_read_unlock();
++
++	if (!segs || unlikely(IS_ERR(segs)))
++		goto out;
++
++	skb = segs;
++	do {
++		iph = skb->nh.iph;
++		iph->id = htons(id++);
++		iph->tot_len = htons(skb->len - skb->mac_len);
++		iph->check = 0;
++		iph->check = ip_fast_csum(skb->nh.raw, iph->ihl);
++	} while ((skb = skb->next));
++
++out:
++	return segs;
++}
++
+ #ifdef CONFIG_IP_MULTICAST
+ static struct net_protocol igmp_protocol = {
+ 	.handler =	igmp_rcv,
+@@ -1093,6 +1142,7 @@ #endif
+ static struct net_protocol tcp_protocol = {
+ 	.handler =	tcp_v4_rcv,
+ 	.err_handler =	tcp_v4_err,
++	.gso_segment =	tcp_tso_segment,
+ 	.no_policy =	1,
+ };
+ 
+@@ -1138,6 +1188,7 @@ static int ipv4_proc_init(void);
+ static struct packet_type ip_packet_type = {
+ 	.type = __constant_htons(ETH_P_IP),
+ 	.func = ip_rcv,
++	.gso_segment = inet_gso_segment,
+ };
+ 
+ static int __init inet_init(void)
+diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
+index 8dcba38..19c3c73 100644
+--- a/net/ipv4/ip_output.c
++++ b/net/ipv4/ip_output.c
+@@ -210,8 +210,7 @@ #if defined(CONFIG_NETFILTER) && defined
+ 		return dst_output(skb);
+ 	}
+ #endif
+-	if (skb->len > dst_mtu(skb->dst) &&
+-	    !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size))
++	if (skb->len > dst_mtu(skb->dst) &&
!skb_shinfo(skb)->gso_size)
+ 		return ip_fragment(skb, ip_finish_output2);
+ 	else
+ 		return ip_finish_output2(skb);
+@@ -362,7 +361,7 @@ packet_routed:
+ 	}
+ 
+ 	ip_select_ident_more(iph, &rt->u.dst, sk,
+-			     (skb_shinfo(skb)->tso_segs ?: 1) - 1);
++			     (skb_shinfo(skb)->gso_segs ?: 1) - 1);
+ 
+ 	/* Add an IP checksum. */
+ 	ip_send_check(iph);
+@@ -743,7 +742,8 @@ static inline int ip_ufo_append_data(str
+ 			       (length - transhdrlen));
+ 	if (!err) {
+ 		/* specify the length of each IP datagram fragment*/
+-		skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
++		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
++		skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
+ 		__skb_queue_tail(&sk->sk_write_queue, skb);
+ 
+ 		return 0;
+@@ -839,7 +839,7 @@ int ip_append_data(struct sock *sk,
+ 	 */
+ 	if (transhdrlen &&
+ 	    length + fragheaderlen <= mtu &&
+-	   
rt->u.dst.dev->features&(NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM)
&&
++	    rt->u.dst.dev->features & NETIF_F_ALL_CSUM &&
+ 	    !exthdrlen)
+ 		csummode = CHECKSUM_HW;
+ 
+@@ -1086,14 +1086,16 @@ ssize_t	ip_append_page(struct sock *sk, 
+ 
+ 	inet->cork.length += size;
+ 	if ((sk->sk_protocol == IPPROTO_UDP) &&
+-	    (rt->u.dst.dev->features & NETIF_F_UFO))
+-		skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen);
++	    (rt->u.dst.dev->features & NETIF_F_UFO)) {
++		skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
++		skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
++	}
+ 
+ 
+ 	while (size > 0) {
+ 		int i;
+ 
+-		if (skb_shinfo(skb)->ufo_size)
++		if (skb_shinfo(skb)->gso_size)
+ 			len = size;
+ 		else {
+ 
+diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
+index d64e2ec..7494823 100644
+--- a/net/ipv4/ipcomp.c
++++ b/net/ipv4/ipcomp.c
+@@ -84,7 +84,7 @@ static int ipcomp_input(struct xfrm_stat
+                         struct xfrm_decap_state *decap, struct sk_buff *skb)
+ {
+ 	u8 nexthdr;
+-	int err = 0;
++	int err = -ENOMEM;
+ 	struct iphdr *iph;
+ 	union {
+ 		struct iphdr	iph;
+@@ -92,11 +92,8 @@ static int ipcomp_input(struct xfrm_stat
+ 	} tmp_iph;
+ 
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-	    skb_linearize(skb, GFP_ATOMIC) != 0) {
+-	    	err = -ENOMEM;
++	if (skb_linearize_cow(skb))
+ 	    	goto out;
+-	}
+ 
+ 	skb->ip_summed = CHECKSUM_NONE;
+ 
+@@ -171,10 +168,8 @@ static int ipcomp_output(struct xfrm_sta
+ 		goto out_ok;
+ 	}
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-	    skb_linearize(skb, GFP_ATOMIC) != 0) {
++	if (skb_linearize_cow(skb))
+ 		goto out_ok;
+-	}
+ 	
+ 	err = ipcomp_compress(x, skb);
+ 	iph = skb->nh.iph;
+diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
+index 00aa80e..84130c9 100644
+--- a/net/ipv4/tcp.c
++++ b/net/ipv4/tcp.c
+@@ -257,6 +257,7 @@ #include <linux/smp_lock.h>
+ #include <linux/fs.h>
+ #include <linux/random.h>
+ #include <linux/bootmem.h>
++#include <linux/err.h>
+ 
+ #include <net/icmp.h>
+ #include <net/tcp.h>
+@@ -570,7 +571,7 @@ new_segment:
+ 		skb->ip_summed = CHECKSUM_HW;
+ 		tp->write_seq += copy;
+ 		TCP_SKB_CB(skb)->end_seq += copy;
+-		skb_shinfo(skb)->tso_segs = 0;
++		skb_shinfo(skb)->gso_segs = 0;
+ 
+ 		if (!copied)
+ 			TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH;
+@@ -621,14 +622,10 @@ ssize_t tcp_sendpage(struct socket *sock
+ 	ssize_t res;
+ 	struct sock *sk = sock->sk;
+ 
+-#define TCP_ZC_CSUM_FLAGS (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM |
NETIF_F_HW_CSUM)
+-
+ 	if (!(sk->sk_route_caps & NETIF_F_SG) ||
+-	    !(sk->sk_route_caps & TCP_ZC_CSUM_FLAGS))
++	    !(sk->sk_route_caps & NETIF_F_ALL_CSUM))
+ 		return sock_no_sendpage(sock, page, offset, size, flags);
+ 
+-#undef TCP_ZC_CSUM_FLAGS
+-
+ 	lock_sock(sk);
+ 	TCP_CHECK_TIMER(sk);
+ 	res = do_tcp_sendpages(sk, &page, offset, size, flags);
+@@ -725,9 +722,7 @@ new_segment:
+ 				/*
+ 				 * Check whether we can use HW checksum.
+ 				 */
+-				if (sk->sk_route_caps &
+-				    (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM |
+-				     NETIF_F_HW_CSUM))
++				if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
+ 					skb->ip_summed = CHECKSUM_HW;
+ 
+ 				skb_entail(sk, tp, skb);
+@@ -823,7 +818,7 @@ new_segment:
+ 
+ 			tp->write_seq += copy;
+ 			TCP_SKB_CB(skb)->end_seq += copy;
+-			skb_shinfo(skb)->tso_segs = 0;
++			skb_shinfo(skb)->gso_segs = 0;
+ 
+ 			from += copy;
+ 			copied += copy;
+@@ -2026,6 +2021,71 @@ int tcp_getsockopt(struct sock *sk, int 
+ }
+ 
+ 
++struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
++{
++	struct sk_buff *segs = ERR_PTR(-EINVAL);
++	struct tcphdr *th;
++	unsigned thlen;
++	unsigned int seq;
++	unsigned int delta;
++	unsigned int oldlen;
++	unsigned int len;
++
++	if (!pskb_may_pull(skb, sizeof(*th)))
++		goto out;
++
++	th = skb->h.th;
++	thlen = th->doff * 4;
++	if (thlen < sizeof(*th))
++		goto out;
++
++	if (!pskb_may_pull(skb, thlen))
++		goto out;
++
++	segs = NULL;
++	if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST))
++		goto out;
++
++	oldlen = (u16)~skb->len;
++	__skb_pull(skb, thlen);
++
++	segs = skb_segment(skb, features);
++	if (IS_ERR(segs))
++		goto out;
++
++	len = skb_shinfo(skb)->gso_size;
++	delta = htonl(oldlen + (thlen + len));
++
++	skb = segs;
++	th = skb->h.th;
++	seq = ntohl(th->seq);
++
++	do {
++		th->fin = th->psh = 0;
++
++		th->check = ~csum_fold(th->check + delta);
++		if (skb->ip_summed != CHECKSUM_HW)
++			th->check = csum_fold(csum_partial(skb->h.raw, thlen,
++							   skb->csum));
++
++		seq += len;
++		skb = skb->next;
++		th = skb->h.th;
++
++		th->seq = htonl(seq);
++		th->cwr = 0;
++	} while (skb->next);
++
++	delta = htonl(oldlen + (skb->tail - skb->h.raw) + skb->data_len);
++	th->check = ~csum_fold(th->check + delta);
++	if (skb->ip_summed != CHECKSUM_HW)
++		th->check = csum_fold(csum_partial(skb->h.raw, thlen,
++						   skb->csum));
++
++out:
++	return segs;
++}
++
+ extern void __skb_cb_too_small_for_tcp(int, int);
+ extern struct tcp_congestion_ops tcp_reno;
+ 
+diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
+index e9a54ae..defe77a 100644
+--- a/net/ipv4/tcp_input.c
++++ b/net/ipv4/tcp_input.c
+@@ -1072,7 +1072,7 @@ tcp_sacktag_write_queue(struct sock *sk,
+ 				else
+ 					pkt_len = (end_seq -
+ 						   TCP_SKB_CB(skb)->seq);
+-				if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->tso_size))
++				if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->gso_size))
+ 					break;
+ 				pcount = tcp_skb_pcount(skb);
+ 			}
+diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
+index 310f2e6..ee01f69 100644
+--- a/net/ipv4/tcp_output.c
++++ b/net/ipv4/tcp_output.c
+@@ -497,15 +497,17 @@ static void tcp_set_skb_tso_segs(struct 
+ 		/* Avoid the costly divide in the normal
+ 		 * non-TSO case.
+ 		 */
+-		skb_shinfo(skb)->tso_segs = 1;
+-		skb_shinfo(skb)->tso_size = 0;
++		skb_shinfo(skb)->gso_segs = 1;
++		skb_shinfo(skb)->gso_size = 0;
++		skb_shinfo(skb)->gso_type = 0;
+ 	} else {
+ 		unsigned int factor;
+ 
+ 		factor = skb->len + (mss_now - 1);
+ 		factor /= mss_now;
+-		skb_shinfo(skb)->tso_segs = factor;
+-		skb_shinfo(skb)->tso_size = mss_now;
++		skb_shinfo(skb)->gso_segs = factor;
++		skb_shinfo(skb)->gso_size = mss_now;
++		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+ 	}
+ }
+ 
+@@ -850,7 +852,7 @@ static int tcp_init_tso_segs(struct sock
+ 
+ 	if (!tso_segs ||
+ 	    (tso_segs > 1 &&
+-	     skb_shinfo(skb)->tso_size != mss_now)) {
++	     tcp_skb_mss(skb) != mss_now)) {
+ 		tcp_set_skb_tso_segs(sk, skb, mss_now);
+ 		tso_segs = tcp_skb_pcount(skb);
+ 	}
+@@ -1510,8 +1512,9 @@ int tcp_retransmit_skb(struct sock *sk, 
+ 	   tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
+ 		if (!pskb_trim(skb, 0)) {
+ 			TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1;
+-			skb_shinfo(skb)->tso_segs = 1;
+-			skb_shinfo(skb)->tso_size = 0;
++			skb_shinfo(skb)->gso_segs = 1;
++			skb_shinfo(skb)->gso_size = 0;
++			skb_shinfo(skb)->gso_type = 0;
+ 			skb->ip_summed = CHECKSUM_NONE;
+ 			skb->csum = 0;
+ 		}
+@@ -1716,8 +1719,9 @@ void tcp_send_fin(struct sock *sk)
+ 		skb->csum = 0;
+ 		TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_FIN);
+ 		TCP_SKB_CB(skb)->sacked = 0;
+-		skb_shinfo(skb)->tso_segs = 1;
+-		skb_shinfo(skb)->tso_size = 0;
++		skb_shinfo(skb)->gso_segs = 1;
++		skb_shinfo(skb)->gso_size = 0;
++		skb_shinfo(skb)->gso_type = 0;
+ 
+ 		/* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */
+ 		TCP_SKB_CB(skb)->seq = tp->write_seq;
+@@ -1749,8 +1753,9 @@ void tcp_send_active_reset(struct sock *
+ 	skb->csum = 0;
+ 	TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_RST);
+ 	TCP_SKB_CB(skb)->sacked = 0;
+-	skb_shinfo(skb)->tso_segs = 1;
+-	skb_shinfo(skb)->tso_size = 0;
++	skb_shinfo(skb)->gso_segs = 1;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 
+ 	/* Send it off. */
+ 	TCP_SKB_CB(skb)->seq = tcp_acceptable_seq(sk, tp);
+@@ -1833,8 +1838,9 @@ struct sk_buff * tcp_make_synack(struct 
+ 	TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
+ 	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;
+ 	TCP_SKB_CB(skb)->sacked = 0;
+-	skb_shinfo(skb)->tso_segs = 1;
+-	skb_shinfo(skb)->tso_size = 0;
++	skb_shinfo(skb)->gso_segs = 1;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 	th->seq = htonl(TCP_SKB_CB(skb)->seq);
+ 	th->ack_seq = htonl(tcp_rsk(req)->rcv_isn + 1);
+ 	if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */
+@@ -1937,8 +1943,9 @@ int tcp_connect(struct sock *sk)
+ 	TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN;
+ 	TCP_ECN_send_syn(sk, tp, buff);
+ 	TCP_SKB_CB(buff)->sacked = 0;
+-	skb_shinfo(buff)->tso_segs = 1;
+-	skb_shinfo(buff)->tso_size = 0;
++	skb_shinfo(buff)->gso_segs = 1;
++	skb_shinfo(buff)->gso_size = 0;
++	skb_shinfo(buff)->gso_type = 0;
+ 	buff->csum = 0;
+ 	TCP_SKB_CB(buff)->seq = tp->write_seq++;
+ 	TCP_SKB_CB(buff)->end_seq = tp->write_seq;
+@@ -2042,8 +2049,9 @@ void tcp_send_ack(struct sock *sk)
+ 		buff->csum = 0;
+ 		TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK;
+ 		TCP_SKB_CB(buff)->sacked = 0;
+-		skb_shinfo(buff)->tso_segs = 1;
+-		skb_shinfo(buff)->tso_size = 0;
++		skb_shinfo(buff)->gso_segs = 1;
++		skb_shinfo(buff)->gso_size = 0;
++		skb_shinfo(buff)->gso_type = 0;
+ 
+ 		/* Send it off, this clears delayed acks for us. */
+ 		TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq =
tcp_acceptable_seq(sk, tp);
+@@ -2078,8 +2086,9 @@ static int tcp_xmit_probe_skb(struct soc
+ 	skb->csum = 0;
+ 	TCP_SKB_CB(skb)->flags = TCPCB_FLAG_ACK;
+ 	TCP_SKB_CB(skb)->sacked = urgent;
+-	skb_shinfo(skb)->tso_segs = 1;
+-	skb_shinfo(skb)->tso_size = 0;
++	skb_shinfo(skb)->gso_segs = 1;
++	skb_shinfo(skb)->gso_size = 0;
++	skb_shinfo(skb)->gso_type = 0;
+ 
+ 	/* Use a previous sequence.  This should cause the other
+ 	 * end to send an ack.  Don''t queue or clone SKB, just
+diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
+index 32ad229..737c1db 100644
+--- a/net/ipv4/xfrm4_output.c
++++ b/net/ipv4/xfrm4_output.c
+@@ -9,6 +9,8 @@
+  */
+ 
+ #include <linux/compiler.h>
++#include <linux/if_ether.h>
++#include <linux/kernel.h>
+ #include <linux/skbuff.h>
+ #include <linux/spinlock.h>
+ #include <linux/netfilter_ipv4.h>
+@@ -152,16 +154,10 @@ error_nolock:
+ 	goto out_exit;
+ }
+ 
+-static int xfrm4_output_finish(struct sk_buff *skb)
++static int xfrm4_output_finish2(struct sk_buff *skb)
+ {
+ 	int err;
+ 
+-#ifdef CONFIG_NETFILTER
+-	if (!skb->dst->xfrm) {
+-		IPCB(skb)->flags |= IPSKB_REROUTED;
+-		return dst_output(skb);
+-	}
+-#endif
+ 	while (likely((err = xfrm4_output_one(skb)) == 0)) {
+ 		nf_reset(skb);
+ 
+@@ -174,7 +170,7 @@ #endif
+ 			return dst_output(skb);
+ 
+ 		err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL,
+-			      skb->dst->dev, xfrm4_output_finish);
++			      skb->dst->dev, xfrm4_output_finish2);
+ 		if (unlikely(err != 1))
+ 			break;
+ 	}
+@@ -182,6 +178,48 @@ #endif
+ 	return err;
+ }
+ 
++static int xfrm4_output_finish(struct sk_buff *skb)
++{
++	struct sk_buff *segs;
++
++#ifdef CONFIG_NETFILTER
++	if (!skb->dst->xfrm) {
++		IPCB(skb)->flags |= IPSKB_REROUTED;
++		return dst_output(skb);
++	}
++#endif
++
++	if (!skb_shinfo(skb)->gso_size)
++		return xfrm4_output_finish2(skb);
++
++	skb->protocol = htons(ETH_P_IP);
++	segs = skb_gso_segment(skb, 0);
++	kfree_skb(skb);
++	if (unlikely(IS_ERR(segs)))
++		return PTR_ERR(segs);
++
++	do {
++		struct sk_buff *nskb = segs->next;
++		int err;
++
++		segs->next = NULL;
++		err = xfrm4_output_finish2(segs);
++
++		if (unlikely(err)) {
++			while ((segs = nskb)) {
++				nskb = segs->next;
++				segs->next = NULL;
++				kfree_skb(segs);
++			}
++			return err;
++		}
++
++		segs = nskb;
++	} while (segs);
++
++	return 0;
++}
++
+ int xfrm4_output(struct sk_buff *skb)
+ {
+ 	return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL,
skb->dst->dev,
+diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
+index 5bf70b1..cf5d17e 100644
+--- a/net/ipv6/ip6_output.c
++++ b/net/ipv6/ip6_output.c
+@@ -147,7 +147,7 @@ static int ip6_output2(struct sk_buff *s
+ 
+ int ip6_output(struct sk_buff *skb)
+ {
+-	if ((skb->len > dst_mtu(skb->dst) &&
!skb_shinfo(skb)->ufo_size) ||
++	if ((skb->len > dst_mtu(skb->dst) &&
!skb_shinfo(skb)->gso_size) ||
+ 				dst_allfrag(skb->dst))
+ 		return ip6_fragment(skb, ip6_output2);
+ 	else
+@@ -829,8 +829,9 @@ static inline int ip6_ufo_append_data(st
+ 		struct frag_hdr fhdr;
+ 
+ 		/* specify the length of each IP datagram fragment*/
+-		skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen) - 
+-						sizeof(struct frag_hdr);
++		skb_shinfo(skb)->gso_size = mtu - fragheaderlen - 
++					    sizeof(struct frag_hdr);
++		skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4;
+ 		ipv6_select_ident(skb, &fhdr);
+ 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
+ 		__skb_queue_tail(&sk->sk_write_queue, skb);
+diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
+index d511a88..ef56d5d 100644
+--- a/net/ipv6/ipcomp6.c
++++ b/net/ipv6/ipcomp6.c
+@@ -64,7 +64,7 @@ static LIST_HEAD(ipcomp6_tfms_list);
+ 
+ static int ipcomp6_input(struct xfrm_state *x, struct xfrm_decap_state *decap,
struct sk_buff *skb)
+ {
+-	int err = 0;
++	int err = -ENOMEM;
+ 	u8 nexthdr = 0;
+ 	int hdr_len = skb->h.raw - skb->nh.raw;
+ 	unsigned char *tmp_hdr = NULL;
+@@ -75,11 +75,8 @@ static int ipcomp6_input(struct xfrm_sta
+ 	struct crypto_tfm *tfm;
+ 	int cpu;
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-		skb_linearize(skb, GFP_ATOMIC) != 0) {
+-		err = -ENOMEM;
++	if (skb_linearize_cow(skb))
+ 		goto out;
+-	}
+ 
+ 	skb->ip_summed = CHECKSUM_NONE;
+ 
+@@ -158,10 +155,8 @@ static int ipcomp6_output(struct xfrm_st
+ 		goto out_ok;
+ 	}
+ 
+-	if ((skb_is_nonlinear(skb) || skb_cloned(skb)) &&
+-		skb_linearize(skb, GFP_ATOMIC) != 0) {
++	if (skb_linearize_cow(skb))
+ 		goto out_ok;
+-	}
+ 
+ 	/* compression */
+ 	plen = skb->len - hdr_len;
+diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
+index 8024217..39bdeec 100644
+--- a/net/ipv6/xfrm6_output.c
++++ b/net/ipv6/xfrm6_output.c
+@@ -151,7 +151,7 @@ error_nolock:
+ 	goto out_exit;
+ }
+ 
+-static int xfrm6_output_finish(struct sk_buff *skb)
++static int xfrm6_output_finish2(struct sk_buff *skb)
+ {
+ 	int err;
+ 
+@@ -167,7 +167,7 @@ static int xfrm6_output_finish(struct sk
+ 			return dst_output(skb);
+ 
+ 		err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL,
+-			      skb->dst->dev, xfrm6_output_finish);
++			      skb->dst->dev, xfrm6_output_finish2);
+ 		if (unlikely(err != 1))
+ 			break;
+ 	}
+@@ -175,6 +175,41 @@ static int xfrm6_output_finish(struct sk
+ 	return err;
+ }
+ 
++static int xfrm6_output_finish(struct sk_buff *skb)
++{
++	struct sk_buff *segs;
++
++	if (!skb_shinfo(skb)->gso_size)
++		return xfrm6_output_finish2(skb);
++
++	skb->protocol = htons(ETH_P_IP);
++	segs = skb_gso_segment(skb, 0);
++	kfree_skb(skb);
++	if (unlikely(IS_ERR(segs)))
++		return PTR_ERR(segs);
++
++	do {
++		struct sk_buff *nskb = segs->next;
++		int err;
++
++		segs->next = NULL;
++		err = xfrm6_output_finish2(segs);
++
++		if (unlikely(err)) {
++			while ((segs = nskb)) {
++				nskb = segs->next;
++				segs->next = NULL;
++				kfree_skb(segs);
++			}
++			return err;
++		}
++
++		segs = nskb;
++	} while (segs);
++
++	return 0;
++}
++
+ int xfrm6_output(struct sk_buff *skb)
+ {
+ 	return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev,
+diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
+index 99ceb91..28c9efd 100644
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -72,9 +72,9 @@ void qdisc_unlock_tree(struct net_device
+    dev->queue_lock serializes queue accesses for this device
+    AND dev->qdisc pointer itself.
+ 
+-   dev->xmit_lock serializes accesses to device driver.
++   netif_tx_lock serializes accesses to device driver.
+ 
+-   dev->queue_lock and dev->xmit_lock are mutually exclusive,
++   dev->queue_lock and netif_tx_lock are mutually exclusive,
+    if one is grabbed, another must be free.
+  */
+ 
+@@ -90,14 +90,17 @@ void qdisc_unlock_tree(struct net_device
+    NOTE: Called under dev->queue_lock with locally disabled BH.
+ */
+ 
+-int qdisc_restart(struct net_device *dev)
++static inline int qdisc_restart(struct net_device *dev)
+ {
+ 	struct Qdisc *q = dev->qdisc;
+ 	struct sk_buff *skb;
+ 
+ 	/* Dequeue packet */
+-	if ((skb = q->dequeue(q)) != NULL) {
++	if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) {
+ 		unsigned nolock = (dev->features & NETIF_F_LLTX);
++
++		dev->gso_skb = NULL;
++
+ 		/*
+ 		 * When the driver has LLTX set it does its own locking
+ 		 * in start_xmit. No need to add additional overhead by
+@@ -108,7 +111,7 @@ int qdisc_restart(struct net_device *dev
+ 		 * will be requeued.
+ 		 */
+ 		if (!nolock) {
+-			if (!spin_trylock(&dev->xmit_lock)) {
++			if (!netif_tx_trylock(dev)) {
+ 			collision:
+ 				/* So, someone grabbed the driver. */
+ 				
+@@ -126,8 +129,6 @@ int qdisc_restart(struct net_device *dev
+ 				__get_cpu_var(netdev_rx_stat).cpu_collision++;
+ 				goto requeue;
+ 			}
+-			/* Remember that the driver is grabbed by us. */
+-			dev->xmit_lock_owner = smp_processor_id();
+ 		}
+ 		
+ 		{
+@@ -136,14 +137,11 @@ int qdisc_restart(struct net_device *dev
+ 
+ 			if (!netif_queue_stopped(dev)) {
+ 				int ret;
+-				if (netdev_nit)
+-					dev_queue_xmit_nit(skb, dev);
+ 
+-				ret = dev->hard_start_xmit(skb, dev);
++				ret = dev_hard_start_xmit(skb, dev);
+ 				if (ret == NETDEV_TX_OK) { 
+ 					if (!nolock) {
+-						dev->xmit_lock_owner = -1;
+-						spin_unlock(&dev->xmit_lock);
++						netif_tx_unlock(dev);
+ 					}
+ 					spin_lock(&dev->queue_lock);
+ 					return -1;
+@@ -157,8 +155,7 @@ int qdisc_restart(struct net_device *dev
+ 			/* NETDEV_TX_BUSY - we need to requeue */
+ 			/* Release the driver */
+ 			if (!nolock) { 
+-				dev->xmit_lock_owner = -1;
+-				spin_unlock(&dev->xmit_lock);
++				netif_tx_unlock(dev);
+ 			} 
+ 			spin_lock(&dev->queue_lock);
+ 			q = dev->qdisc;
+@@ -175,7 +172,10 @@ int qdisc_restart(struct net_device *dev
+ 		 */
+ 
+ requeue:
+-		q->ops->requeue(skb, q);
++		if (skb->next)
++			dev->gso_skb = skb;
++		else
++			q->ops->requeue(skb, q);
+ 		netif_schedule(dev);
+ 		return 1;
+ 	}
+@@ -183,11 +183,23 @@ requeue:
+ 	return q->q.qlen;
+ }
+ 
++void __qdisc_run(struct net_device *dev)
++{
++	if (unlikely(dev->qdisc == &noop_qdisc))
++		goto out;
++
++	while (qdisc_restart(dev) < 0 && !netif_queue_stopped(dev))
++		/* NOTHING */;
++
++out:
++	clear_bit(__LINK_STATE_QDISC_RUNNING, &dev->state);
++}
++
+ static void dev_watchdog(unsigned long arg)
+ {
+ 	struct net_device *dev = (struct net_device *)arg;
+ 
+-	spin_lock(&dev->xmit_lock);
++	netif_tx_lock(dev);
+ 	if (dev->qdisc != &noop_qdisc) {
+ 		if (netif_device_present(dev) &&
+ 		    netif_running(dev) &&
+@@ -201,7 +213,7 @@ static void dev_watchdog(unsigned long a
+ 				dev_hold(dev);
+ 		}
+ 	}
+-	spin_unlock(&dev->xmit_lock);
++	netif_tx_unlock(dev);
+ 
+ 	dev_put(dev);
+ }
+@@ -225,17 +237,17 @@ void __netdev_watchdog_up(struct net_dev
+ 
+ static void dev_watchdog_up(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	__netdev_watchdog_up(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ static void dev_watchdog_down(struct net_device *dev)
+ {
+-	spin_lock_bh(&dev->xmit_lock);
++	netif_tx_lock_bh(dev);
+ 	if (del_timer(&dev->watchdog_timer))
+ 		__dev_put(dev);
+-	spin_unlock_bh(&dev->xmit_lock);
++	netif_tx_unlock_bh(dev);
+ }
+ 
+ void netif_carrier_on(struct net_device *dev)
+@@ -577,10 +589,17 @@ void dev_deactivate(struct net_device *d
+ 
+ 	dev_watchdog_down(dev);
+ 
+-	while (test_bit(__LINK_STATE_SCHED, &dev->state))
++	/* Wait for outstanding dev_queue_xmit calls. */
++	synchronize_rcu();
++
++	/* Wait for outstanding qdisc_run calls. */
++	while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
+ 		yield();
+ 
+-	spin_unlock_wait(&dev->xmit_lock);
++	if (dev->gso_skb) {
++		kfree_skb(dev->gso_skb);
++		dev->gso_skb = NULL;
++	}
+ }
+ 
+ void dev_init_scheduler(struct net_device *dev)
+@@ -622,6 +641,5 @@ EXPORT_SYMBOL(qdisc_create_dflt);
+ EXPORT_SYMBOL(qdisc_alloc);
+ EXPORT_SYMBOL(qdisc_destroy);
+ EXPORT_SYMBOL(qdisc_reset);
+-EXPORT_SYMBOL(qdisc_restart);
+ EXPORT_SYMBOL(qdisc_lock_tree);
+ EXPORT_SYMBOL(qdisc_unlock_tree);
+diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
+index 79b8ef3..4c16ad5 100644
+--- a/net/sched/sch_teql.c
++++ b/net/sched/sch_teql.c
+@@ -302,20 +302,17 @@ restart:
+ 
+ 		switch (teql_resolve(skb, skb_res, slave)) {
+ 		case 0:
+-			if (spin_trylock(&slave->xmit_lock)) {
+-				slave->xmit_lock_owner = smp_processor_id();
++			if (netif_tx_trylock(slave)) {
+ 				if (!netif_queue_stopped(slave) &&
+ 				    slave->hard_start_xmit(skb, slave) == 0) {
+-					slave->xmit_lock_owner = -1;
+-					spin_unlock(&slave->xmit_lock);
++					netif_tx_unlock(slave);
+ 					master->slaves = NEXT_SLAVE(q);
+ 					netif_wake_queue(dev);
+ 					master->stats.tx_packets++;
+ 					master->stats.tx_bytes += len;
+ 					return 0;
+ 				}
+-				slave->xmit_lock_owner = -1;
+-				spin_unlock(&slave->xmit_lock);
++				netif_tx_unlock(slave);
+ 			}
+ 			if (netif_queue_stopped(dev))
+ 				busy = 1;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 03:59 UTC

head link

[Xen-devel] [2/5] [NET]: Give make_tx_response the req pointer instead of id

Hi:

[NET]: Give make_tx_response the req pointer instead of id

This patch changes the make_tx_response id argument to a request pointer
instead.  This allows us to test the request flag in future for TSO.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 6913e0756b81 -r 504315a3ec5e
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:44:18
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:51:08
2006 +1000
@@ -43,7 +43,7 @@ static void netif_idx_release(u16 pendin
 static void netif_idx_release(u16 pending_idx);
 static void netif_page_release(struct page *page);
 static void make_tx_response(netif_t *netif, 
-			     u16      id,
+			     netif_tx_request_t *txp,
 			     s8       st);
 static int  make_rx_response(netif_t *netif, 
 			     u16      id, 
@@ -481,7 +481,7 @@ inline static void net_tx_action_dealloc
 
 		netif = pending_tx_info[pending_idx].netif;
 
-		make_tx_response(netif, pending_tx_info[pending_idx].req.id, 
+		make_tx_response(netif, &pending_tx_info[pending_idx].req, 
 				 NETIF_RSP_OKAY);
 
 		pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx;
@@ -496,7 +496,7 @@ static void netbk_tx_err(netif_t *netif,
 
 	do {
 		netif_tx_request_t *txp = RING_GET_REQUEST(&netif->tx, cons);
-		make_tx_response(netif, txp->id, NETIF_RSP_ERROR);
+		make_tx_response(netif, txp, NETIF_RSP_ERROR);
 	} while (++cons < end);
 	netif->tx.req_cons = cons;
 	netif_schedule_work(netif);
@@ -581,7 +581,7 @@ static int netbk_tx_check_mop(struct sk_
 	err = mop->status;
 	if (unlikely(err)) {
 		txp = &pending_tx_info[pending_idx].req;
-		make_tx_response(netif, txp->id, NETIF_RSP_ERROR);
+		make_tx_response(netif, txp, NETIF_RSP_ERROR);
 		pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx;
 		netif_put(netif);
 	} else {
@@ -614,7 +614,7 @@ static int netbk_tx_check_mop(struct sk_
 
 		/* Error on this fragment: respond to client with an error. */
 		txp = &pending_tx_info[pending_idx].req;
-		make_tx_response(netif, txp->id, NETIF_RSP_ERROR);
+		make_tx_response(netif, txp, NETIF_RSP_ERROR);
 		pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx;
 		netif_put(netif);
 
@@ -898,7 +898,7 @@ irqreturn_t netif_be_int(int irq, void *
 }
 
 static void make_tx_response(netif_t *netif, 
-			     u16      id,
+			     netif_tx_request_t *txp,
 			     s8       st)
 {
 	RING_IDX i = netif->tx.rsp_prod_pvt;
@@ -906,7 +906,7 @@ static void make_tx_response(netif_t *ne
 	int notify;
 
 	resp = RING_GET_RESPONSE(&netif->tx, i);
-	resp->id     = id;
+	resp->id     = txp->id;
 	resp->status = st;
 
 	netif->tx.rsp_prod_pvt = ++i;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 04:00 UTC

head link

[Xen-devel] [3/5] [NET] loopback: Added support for TSO

Hi:

[NET] loopback: Added support for TSO

Just like SG, TSO support here is innate.  So all we need to do is mark it
as such.  This patch also adds the ethtool control functions for SG.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 504315a3ec5e -r 7ec216a8bc14
linux-2.6-xen-sparse/drivers/xen/netback/loopback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c	Wed Jun 28 13:51:08
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c	Wed Jun 28 13:51:21
2006 +1000
@@ -125,6 +125,10 @@ static struct ethtool_ops network_ethtoo
 {
 	.get_tx_csum = ethtool_op_get_tx_csum,
 	.set_tx_csum = ethtool_op_set_tx_csum,
+	.get_sg = ethtool_op_get_sg,
+	.set_sg = ethtool_op_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = ethtool_op_set_tso,
 };
 
 /*
@@ -152,6 +156,7 @@ static void loopback_construct(struct ne
 
 	dev->features        = (NETIF_F_HIGHDMA |
 				NETIF_F_LLTX |
+				NETIF_F_TSO |
 				NETIF_F_SG |
 				NETIF_F_IP_CSUM);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 04:00 UTC

head link

[Xen-devel] [4/5] [NET] back: Add TSO support

Hi:

[NET] back: Add TSO support

This patch adds TCP Segmentation Offload (TSO) support to the backend.
It also advertises this fact through xenbus so that the frontend can
detect this and send through TSO requests only if it is supported.

This is done using an extra request slot which is indicated by a flag
in the first slot.  In future checksum offload can be done in the same
way.

The extra request slot must not be generated if the backend does not
support the appropriate feature bits.  For now this is simply feature-tso.

If the frontend detects the presence of the appropriate feature bits,
it may generate TX requests which have the appropriate request flags
set that indicates the presence of an extra request slot with the extra
information.

On the backend the extra request slot is read if and only if the request
flags are set in the TX request.

This protocol allows more feature bits to be added in future without
breaking compatibility.  At least the hardware checksum bit is planned.

Even though only TSO is supported for now the code actually supports
GSO so it can be applied to any other protocol.  The only missing bit
is the detection of host support for a specific GSO protocol.  Once that
is added we can advertise all supported protocols to the guest.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 7ec216a8bc14 -r 9853b45712e8
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:51:21
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:55:13
2006 +1000
@@ -490,14 +490,16 @@ inline static void net_tx_action_dealloc
 	}
 }
 
-static void netbk_tx_err(netif_t *netif, RING_IDX end)
+static void netbk_tx_err(netif_t *netif, netif_tx_request_t *txp, RING_IDX end)
 {
 	RING_IDX cons = netif->tx.req_cons;
 
 	do {
-		netif_tx_request_t *txp = RING_GET_REQUEST(&netif->tx, cons);
 		make_tx_response(netif, txp, NETIF_RSP_ERROR);
-	} while (++cons < end);
+		if (++cons >= end)
+			break;
+		txp = RING_GET_REQUEST(&netif->tx, cons);
+	} while (1);
 	netif->tx.req_cons = cons;
 	netif_schedule_work(netif);
 	netif_put(netif);
@@ -508,7 +510,7 @@ static int netbk_count_requests(netif_t 
 {
 	netif_tx_request_t *first = txp;
 	RING_IDX cons = netif->tx.req_cons;
-	int frags = 1;
+	int frags = 0;
 
 	while (txp->flags & NETTXF_more_data) {
 		if (frags >= work_to_do) {
@@ -543,7 +545,7 @@ static gnttab_map_grant_ref_t *netbk_get
 	skb_frag_t *frags = shinfo->frags;
 	netif_tx_request_t *txp;
 	unsigned long pending_idx = *((u16 *)skb->data);
-	RING_IDX cons = netif->tx.req_cons + 1;
+	RING_IDX cons = netif->tx.req_cons;
 	int i, start;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -668,6 +670,7 @@ static void net_tx_action(unsigned long 
 	struct sk_buff *skb;
 	netif_t *netif;
 	netif_tx_request_t txreq;
+	struct netif_tx_extra txtra;
 	u16 pending_idx;
 	RING_IDX i;
 	gnttab_map_grant_ref_t *mop;
@@ -726,22 +729,37 @@ static void net_tx_action(unsigned long 
 		}
 		netif->remaining_credit -= txreq.size;
 
+		work_to_do--;
+		netif->tx.req_cons = ++i;
+
+		if (txreq.flags & NETTXF_extra_info) {
+			if (work_to_do-- <= 0) {
+				DPRINTK("Missing extra info\n");
+				netbk_tx_err(netif, &txreq, i);
+				continue;
+			}
+
+			memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i),
+			       sizeof(txtra));
+			netif->tx.req_cons = ++i;
+		}
+
 		ret = netbk_count_requests(netif, &txreq, work_to_do);
 		if (unlikely(ret < 0)) {
-			netbk_tx_err(netif, i - ret);
+			netbk_tx_err(netif, &txreq, i - ret);
 			continue;
 		}
 		i += ret;
 
 		if (unlikely(ret > MAX_SKB_FRAGS + 1)) {
 			DPRINTK("Too many frags\n");
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			continue;
 		}
 
 		if (unlikely(txreq.size < ETH_HLEN)) {
 			DPRINTK("Bad packet size: %d\n", txreq.size);
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			continue; 
 		}
 
@@ -750,25 +768,31 @@ static void net_tx_action(unsigned long 
 			DPRINTK("txreq.offset: %x, size: %u, end: %lu\n", 
 				txreq.offset, txreq.size, 
 				(txreq.offset &~PAGE_MASK) + txreq.size);
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			continue;
 		}
 
 		pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
-			    ret < MAX_SKB_FRAGS + 1) ?
+			    ret < MAX_SKB_FRAGS) ?
 			PKT_PROT_LEN : txreq.size;
 
 		skb = alloc_skb(data_len+16, GFP_ATOMIC);
 		if (unlikely(skb == NULL)) {
 			DPRINTK("Can''t allocate a skb in start_xmit.\n");
-			netbk_tx_err(netif, i);
+			netbk_tx_err(netif, &txreq, i);
 			break;
 		}
 
 		/* Packets passed to netif_rx() must have some headroom. */
 		skb_reserve(skb, 16);
+
+		if (txreq.flags & NETTXF_gso) {
+			skb_shinfo(skb)->gso_size = txtra.gso_size;
+			skb_shinfo(skb)->gso_segs = txtra.gso_segs;
+			skb_shinfo(skb)->gso_type = txtra.gso_type;
+		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
 				  GNTMAP_host_map | GNTMAP_readonly,
@@ -782,7 +806,7 @@ static void net_tx_action(unsigned long 
 
 		__skb_put(skb, data_len);
 
-		skb_shinfo(skb)->nr_frags = ret - 1;
+		skb_shinfo(skb)->nr_frags = ret;
 		if (data_len < txreq.size) {
 			skb_shinfo(skb)->nr_frags++;
 			skb_shinfo(skb)->frags[0].page @@ -908,6 +932,9 @@ static void
make_tx_response(netif_t *ne
 	resp = RING_GET_RESPONSE(&netif->tx, i);
 	resp->id     = txp->id;
 	resp->status = st;
+
+	if (txp->flags & NETTXF_extra_info)
+		RING_GET_RESPONSE(&netif->tx, ++i)->status = NETIF_RSP_NULL;
 
 	netif->tx.rsp_prod_pvt = ++i;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->tx, notify);
diff -r 7ec216a8bc14 -r 9853b45712e8
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Wed Jun 28 13:51:21 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Wed Jun 28 13:55:13 2006
+1000
@@ -101,6 +101,12 @@ static int netback_probe(struct xenbus_d
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-tso",
"%d", 1);
+		if (err) {
+			message = "writing feature-tso";
+			goto abort_transaction;
+		}
+
 		err = xenbus_transaction_end(xbt, 0);
 	} while (err == -EAGAIN);
 
diff -r 7ec216a8bc14 -r 9853b45712e8 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Wed Jun 28 13:51:21 2006 +1000
+++ b/xen/include/public/io/netif.h	Wed Jun 28 13:55:13 2006 +1000
@@ -31,6 +31,13 @@
 #define _NETTXF_more_data      (2)
 #define  NETTXF_more_data      (1U<<_NETTXF_more_data)
 
+/* Packet has GSO fields. */
+#define _NETTXF_gso	       (3)
+#define  NETTXF_gso	       (1U<<_NETTXF_gso)
+
+/* Packet to be folloed by extra descritptor. */
+#define  NETTXF_extra_info     (NETTXF_gso)
+
 struct netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
     uint16_t offset;       /* Offset within buffer page */
@@ -39,6 +46,13 @@ struct netif_tx_request {
     uint16_t size;         /* Packet size in bytes.       */
 };
 typedef struct netif_tx_request netif_tx_request_t;
+
+/* This structure needs to fit within netif_tx_request for compatibility. */
+struct netif_tx_extra {
+    uint16_t gso_size;	   /* GSO MSS. */
+    uint16_t gso_segs;	   /* GSO segment count. */
+    uint16_t gso_type;	   /* GSO type. */
+};
 
 struct netif_tx_response {
     uint16_t id;
@@ -78,6 +92,7 @@ DEFINE_RING_TYPES(netif_rx, struct netif
 #define NETIF_RSP_DROPPED         -2
 #define NETIF_RSP_ERROR           -1
 #define NETIF_RSP_OKAY             0
+#define NETIF_RSP_NULL		   1
 
 #endif
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 04:00 UTC

head link

[Xen-devel] [5/5] [NET] front: Transmit TSO packets if supported

Hi:

[NET] front: Transmit TSO packets if supported

This patch adds TSO transmission support to the frontend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 9853b45712e8 -r 9f4e79081e4a
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Wed Jun 28 13:55:13
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Wed Jun 28 13:55:32
2006 +1000
@@ -463,7 +463,7 @@ static int network_open(struct net_devic
 
 static inline int netfront_tx_slot_available(struct netfront_info *np)
 {
-	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1;
+	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2;
 }
 
 static inline void network_maybe_wake_tx(struct net_device *dev)
@@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net
 		rmb(); /* Ensure we see responses up to ''rp''. */
 
 		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
-			id  = RING_GET_RESPONSE(&np->tx, cons)->id;
+			struct netif_tx_response *txrsp;
+
+			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			if (txrsp->status == NETIF_RSP_NULL)
+				continue;
+
+			id  = txrsp->id;
 			skb = np->tx_skbs[id];
 			if (unlikely(gnttab_query_foreign_access(
 				np->grant_tx_ref[id]) != 0)) {
@@ -739,7 +745,8 @@ static int network_start_xmit(struct sk_
 	spin_lock_irq(&np->tx_lock);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)))) {
+		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     netif_needs_gso(dev, skb))) {
 		spin_unlock_irq(&np->tx_lock);
 		goto drop;
 	}
@@ -762,9 +769,21 @@ static int network_start_xmit(struct sk_
 	tx->size = len;
 
 	tx->flags = 0;
-	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
+	if (skb->ip_summed == CHECKSUM_HW) {
+		/* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
-	if (skb->proto_data_valid) /* remote but checksummed? */
+
+		if (skb_shinfo(skb)->gso_size) {
+			struct netif_tx_extra *txtra +				(struct netif_tx_extra *)
+				RING_GET_REQUEST(&np->tx, ++i);
+
+			tx->flags |= NETTXF_gso;
+			txtra->gso_size = skb_shinfo(skb)->gso_size;
+			txtra->gso_segs = skb_shinfo(skb)->gso_segs;
+			txtra->gso_type = skb_shinfo(skb)->gso_type;
+		}
+	} else if (skb->proto_data_valid) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
 
 	np->tx.req_prod_pvt = i + 1;
@@ -1065,9 +1084,26 @@ static int xennet_set_sg(struct net_devi
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int xennet_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		struct netfront_info *np = netdev_priv(dev);
+		int val;
+
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso",
+				 "%d", &val) < 0)
+			val = 0;
+		if (!val)
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
 static void xennet_set_features(struct net_device *dev)
 {
-	xennet_set_sg(dev, 1);
+	if (!xennet_set_sg(dev, 1))
+		xennet_set_tso(dev, 1);
 }
 
 static void network_connect(struct net_device *dev)
@@ -1148,6 +1184,8 @@ static struct ethtool_ops network_ethtoo
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = xennet_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = xennet_set_tso,
 };
 
 #ifdef CONFIG_SYSFS

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-28 12:30 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 27 Jun 2006, at 13:02, Herbert Xu wrote:
> The following patches add TCP Segmentation Offload (TSO) support in the
> domU => dom0 direction.  If everyone''s happy with this approach
then
> it''s
> trivial to do the same thing for the opposite direction.
I''ve checked in all but the netfront patch. The glitches so far are:

1. The GSO patch broke netback, because netback/interface.c accesses 
dev->xmit_lock. None of your patches fixed this and you can''t build
the
driver unless it''s fixed -- did you somehow miss that file from one of 
the patches you sent?

2. The new ''wire format'' with netif_tx_extra: I placed the GSO
fields
inside a struct inside a union, so we can extend the union with other 
extra-info types in future. I hope that''s okay and in line with what 
you intended.

3. Wire format again: we need some extra documentation and info in 
netif.h for the new GSO fields. Currently they conveniently directly 
correspond to fields in a Linux skbuff: you read them out in netfront 
and write them straight back in netback. That''s fine for Linux for now,
but not so good for other OSes, nor potentially if the Linux GSO 
internals change later.

In particular the gso_type field is concerning. We should provide 
defines for the legitimate values of that field in netif.h, with a 
comment explaining what each one means. Extra comments are also 
required for the other two fields, to convince us that they aren''t 
Linux-specific in some way. Some brief info about how GSO works in 
general, including usage of those fields, would help.

This is why I haven''t added the netfront patch yet -- I don''t
want
domU''s using the new interface until we''re satisfied
we''re not going to
have to change the interface and break compatibility.

  Thanks,
  Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-28 14:02 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Wed, Jun 28, 2006 at 01:30:10PM +0100, Keir Fraser
wrote:> 
> 1. The GSO patch broke netback, because netback/interface.c accesses 
> dev->xmit_lock. None of your patches fixed this and you can''t
build the
> driver unless it''s fixed -- did you somehow miss that file from
one of
> the patches you sent?
Sorry, I missed that hunk while generating the patches.
> 2. The new ''wire format'' with netif_tx_extra: I placed
the GSO fields
> inside a struct inside a union, so we can extend the union with other 
> extra-info types in future. I hope that''s okay and in line with
what
> you intended.
Actually, the idea is to add new fields after the existing fields so
I don''t think a union is a good fit here.  The reason is that the new
fields will likely be used in conjunction with GSO.  In particular, I''m
thinking of the checksum offset/header offset.
> 3. Wire format again: we need some extra documentation and info in 
> netif.h for the new GSO fields. Currently they conveniently directly 
> correspond to fields in a Linux skbuff: you read them out in netfront 
> and write them straight back in netback. That''s fine for Linux for
now,
> but not so good for other OSes, nor potentially if the Linux GSO 
> internals change later.
Well I think things like TSO (and even more so with GSO) are highly
OS-specific so porting them to other paravirt OSes are always going to
be hard.

The way I see it these are simply add-on features that you can enable
to get that extra bit out of your Xen performance.  So it is not required
for your system to function.  Therefore other OSes that do not have
TSO/GSO can simply elect to not use it.

(I''m curious, which other paravirt OSes do you have in mind that would
use something like TSO/GSO? Do they currently support TSO or something
similar?)

That''s how it was designed in general: if the frontend doesn''t
know about
GSO it simply never sends any GSO packets.  If the backend doesn''t know
about GSO it''d never advertise it to the frontend so again no GSO
packets
would be sent.

As to how likely the implementation details are to change in Linux I''d
say that it probably won''t happen anytime soon but I can''t
offer any
guarantees because it is an internel interface.

However, the design of the Xen wire format is such that it should be easy
to adapt to any incompatible changes by either

* Provide translation layers in the netfront/netback if the interface
change does not require new information to be exchanged.

* Adding new feature bits/request flags to indicate that new information
is required.  Older guests/hosts would simply not do TSO with incompatible
new hosts/guests.
> In particular the gso_type field is concerning. We should provide 
> defines for the legitimate values of that field in netif.h, with a 
> comment explaining what each one means. Extra comments are also 
> required for the other two fields, to convince us that they aren''t
> Linux-specific in some way. Some brief info about how GSO works in 
> general, including usage of those fields, would help.
No problems.  I''ve attached a patch that adds more comments for these
fields.  However, I''m really hesitant to add the actual gso_type values
here because I don''t think this helps compatibility in any way.

If Linux ever does make an incompatible change (which believe me I will
do my best to prevent), having those values defined here are not really
going to help people notice the change or provide compatibility.

But if you really want these values, I can add them.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 9f4e79081e4a xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Wed Jun 28 13:55:32 2006 +1000
+++ b/xen/include/public/io/netif.h	Wed Jun 28 23:54:47 2006 +1000
@@ -49,9 +49,23 @@ typedef struct netif_tx_request netif_tx
 
 /* This structure needs to fit within netif_tx_request for compatibility. */
 struct netif_tx_extra {
-    uint16_t gso_size;	   /* GSO MSS. */
-    uint16_t gso_segs;	   /* GSO segment count. */
-    uint16_t gso_type;	   /* GSO type. */
+    /*
+     * Maximum payload size of each segment.  For example, for TCP this is
+     * just the path MSS.
+     */
+    uint16_t gso_size;
+
+    /*
+     * Number of GSO segments.  This is the number of segments that have to
+     * be generated for this packet given the MSS.
+     */
+    uint16_t gso_segs;
+
+    /*
+     * GSO type.  This determines the protocol of the packet and any extra
+     * features required to segment the packet properly.
+     */
+    uint16_t gso_type;
 };
 
 struct netif_tx_response {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-28 14:24 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 28 Jun 2006, at 15:02, Herbert Xu wrote:
>> 2. The new ''wire format'' with netif_tx_extra: I
placed the GSO fields
>> inside a struct inside a union, so we can extend the union with other
>> extra-info types in future. I hope that''s okay and in line
with what
>> you intended.
>
> Actually, the idea is to add new fields after the existing fields so
> I don''t think a union is a good fit here.  The reason is that the
new
> fields will likely be used in conjunction with GSO.  In particular,
I''m
> thinking of the checksum offset/header offset.
Adding new fields on the end of the existing structure is limiting. 
Maybe we could chain extra info structures by declaring 
NETTXF_extra_info in the leading request, then have a sequence of 
netif_tx_extras, each of which is a discriminated union (so a 
NETEXTRA_* type field, a flag indicating if this is the last extra-info 
for this packet, plus a union)?
>
>> 3. Wire format again: we need some extra documentation and info in
>> netif.h for the new GSO fields. Currently they conveniently directly
>> correspond to fields in a Linux skbuff: you read them out in netfront
>> and write them straight back in netback. That''s fine for Linux
for
>> now,
>> but not so good for other OSes, nor potentially if the Linux GSO
>> internals change later.
>
> Well I think things like TSO (and even more so with GSO) are highly
> OS-specific so porting them to other paravirt OSes are always going to
> be hard.
Many NICs support TSO so there should be support in network stacks 
other than Linux. What about *BSD, Solaris, and Windows?
> The way I see it these are simply add-on features that you can enable
> to get that extra bit out of your Xen performance.  So it is not 
> required
> for your system to function.  Therefore other OSes that do not have
> TSO/GSO can simply elect to not use it.
They may not have GSO but they might well have TSO and we should make 
it possible to interface to netback using it.
> * Provide translation layers in the netfront/netback if the interface
> change does not require new information to be exchanged.
>
> * Adding new feature bits/request flags to indicate that new 
> information
> is required.  Older guests/hosts would simply not do TSO with 
> incompatible
> new hosts/guests.
TSO isn''t that complicated. I think we should be able to come up with 
an inter-domain format that we don''t have to break for older guests 
down the line.
> If Linux ever does make an incompatible change (which believe me I will
> do my best to prevent), having those values defined here are not really
> going to help people notice the change or provide compatibility.
>
> But if you really want these values, I can add them.
Yes, we want them, and explicit code in netfront/netback to convert 
between Linux gso types and ''wire'' gso types. Even if they are
initially the same!

Another question: why are gso_size and gso_segs both required? Surely 
those, plus the overall request size, are redundant. e.g., shouldn''t 
gso_segs = tot_size / gso_size (rounded up)?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 00:20 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Wed, Jun 28, 2006 at 03:24:37PM +0100, Keir Fraser
wrote:> 
> Adding new fields on the end of the existing structure is limiting. 
> Maybe we could chain extra info structures by declaring 
> NETTXF_extra_info in the leading request, then have a sequence of 
> netif_tx_extras, each of which is a discriminated union (so a 
> NETEXTRA_* type field, a flag indicating if this is the last extra-info 
> for this packet, plus a union)?
Good idea.  I''ve changed the interface to do exactly that.
 > Many NICs support TSO so there should be support in network stacks 
> other than Linux. What about *BSD, Solaris, and Windows?
They should be able to use the GSO interface and simply always set
gso_type to XEN_GSO_TCPV4.
 > Yes, we want them, and explicit code in netfront/netback to convert 
> between Linux gso types and ''wire'' gso types. Even if
they are
> initially the same!
Done.
> Another question: why are gso_size and gso_segs both required? Surely 
> those, plus the overall request size, are redundant. e.g.,
shouldn''t
> gso_segs = tot_size / gso_size (rounded up)?
For TSO, gso_segs can be easily determined from the packet and gso_size.
However, for GSO, we don''t know the packet header length so the same is
not true.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 00:21 UTC

head link

[Xen-devel] [1/2] [NET] back: Make extra info slot more generic

Hi:

[NET] back: Make extra info slot more generic

Based on suggestions by Keir Fraser, this patch makes the extra slot more
generic by allowing it to be chained and giving each slot a type field.

This makes it easier to add new extra slots such as checksum offset.

I''ve also added GSO type constants specific for Xen.  For now the
conversion
function between them and Linux is a noop.  When and if they do diverge we
can modify them accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 9853b45712e8 -r b5ca6be8ad55
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:55:13
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Thu Jun 29 10:16:22
2006 +1000
@@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b
 	}
 }
 
+int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras,
+		     int work_to_do)
+{
+	struct netif_tx_extra *txtra;
+	int i = netif->tx.req_cons;
+
+	do {
+
+		if (unlikely(work_to_do-- <= 0)) {
+			DPRINTK("Missing extra info\n");
+			return -EBADR;
+		}
+
+		txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx,
+								  i);
+		if (unlikely(!txtra->type ||
+			     txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) {
+			netif->tx.req_cons = ++i;
+			DPRINTK("Invalid extra type: %d\n", txtra->type);
+			return -EINVAL;
+		}
+
+		memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra));
+		netif->tx.req_cons = ++i;
+	} while (txtra->flags & XEN_NETIF_TXTRA_MORE);
+
+	return work_to_do;
+}
+
 /* Called after netfront has transmitted */
 static void net_tx_action(unsigned long unused)
 {
@@ -670,7 +699,7 @@ static void net_tx_action(unsigned long 
 	struct sk_buff *skb;
 	netif_t *netif;
 	netif_tx_request_t txreq;
-	struct netif_tx_extra txtra;
+	struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1];
 	u16 pending_idx;
 	RING_IDX i;
 	gnttab_map_grant_ref_t *mop;
@@ -732,16 +761,15 @@ static void net_tx_action(unsigned long 
 		work_to_do--;
 		netif->tx.req_cons = ++i;
 
+		memset(txtras, 0, sizeof(txtras));
 		if (txreq.flags & NETTXF_extra_info) {
-			if (work_to_do-- <= 0) {
-				DPRINTK("Missing extra info\n");
-				netbk_tx_err(netif, &txreq, i);
+			work_to_do = netbk_get_txtras(netif, txtras,
+						      work_to_do);
+			if (unlikely(work_to_do < 0)) {
+				netbk_tx_err(netif, &txreq, 0);
 				continue;
 			}
-
-			memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i),
-			       sizeof(txtra));
-			netif->tx.req_cons = ++i;
+			i = netif->tx.req_cons;
 		}
 
 		ret = netbk_count_requests(netif, &txreq, work_to_do);
@@ -788,10 +816,15 @@ static void net_tx_action(unsigned long 
 		/* Packets passed to netif_rx() must have some headroom. */
 		skb_reserve(skb, 16);
 
-		if (txreq.flags & NETTXF_gso) {
-			skb_shinfo(skb)->gso_size = txtra.gso_size;
-			skb_shinfo(skb)->gso_segs = txtra.gso_segs;
-			skb_shinfo(skb)->gso_type = txtra.gso_type;
+		if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) {
+			struct netif_tx_extra *gso = txtras - 1 +
+						     XEN_NETIF_TXTRA_TYPE_GSO;
+
+			skb_shinfo(skb)->gso_size = gso->gso.size;
+			skb_shinfo(skb)->gso_segs = gso->gso.segs;
+			skb_shinfo(skb)->gso_type +				xen_gso_type_xen2linux(gso->gso.type);
+
 		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
diff -r 9853b45712e8 -r b5ca6be8ad55 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Wed Jun 28 13:55:13 2006 +1000
+++ b/xen/include/public/io/netif.h	Thu Jun 29 10:16:22 2006 +1000
@@ -31,12 +31,28 @@
 #define _NETTXF_more_data      (2)
 #define  NETTXF_more_data      (1U<<_NETTXF_more_data)
 
-/* Packet has GSO fields. */
-#define _NETTXF_gso	       (3)
-#define  NETTXF_gso	       (1U<<_NETTXF_gso)
+/* Packet to be folloed by extra descritptor. */
+#define _NETTXF_extra_info     (3)
+#define  NETTXF_extra_info     (1U<<_NETTXF_extra_info)
 
-/* Packet to be folloed by extra descritptor. */
-#define  NETTXF_extra_info     (NETTXF_gso)
+enum {
+    XEN_NETIF_TXTRA_TYPE_GSO = 1,
+    XEN_NETIF_TXTRA_TYPE_MAX,
+};
+
+enum {
+    XEN_NETIF_TXTRA_MORE = 1 << 0,
+};
+
+enum {
+    /* TCP over IPv4. */
+    XEN_GSO_TCPV4 = 1 << 0,
+    /* UDP over IPv4. */
+    XEN_GSO_UDPV4 = 1 << 1,
+
+    /* Packet header must be verified. */
+    XEN_GSO_DODGY = 1 << 2,
+};
 
 struct netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
@@ -47,11 +63,35 @@ struct netif_tx_request {
 };
 typedef struct netif_tx_request netif_tx_request_t;
 
-/* This structure needs to fit within netif_tx_request for compatibility. */
+/*
+ * This structure needs to fit within both netif_tx_request and
+ * netif_rx_response for compatibility.
+ */
 struct netif_tx_extra {
-    uint16_t gso_size;	   /* GSO MSS. */
-    uint16_t gso_segs;	   /* GSO segment count. */
-    uint16_t gso_type;	   /* GSO type. */
+    /* Type of extra info. */
+    uint8_t type;
+    /* Flags for this info. */
+    uint8_t flags;
+
+    union {
+	/*
+	 * Maximum payload size of each segment.  For example, for TCP this is
+	 * just the path MSS.
+	 */
+	uint16_t size;
+
+	/*
+	 * Number of GSO segments.  This is the number of segments that have to
+	 * be generated for this packet given the MSS.
+	 */
+	uint16_t segs;
+
+	/*
+	 * GSO type.  This determines the protocol of the packet and any extra
+	 * features required to segment the packet properly.
+	 */
+	uint16_t type;
+    } gso;
 };
 
 struct netif_tx_response {
@@ -94,6 +134,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif
 #define NETIF_RSP_OKAY             0
 #define NETIF_RSP_NULL		   1
 
+/* For now the GSO types are identical. */
+static inline int xen_gso_type_linux2xen(int type)
+{
+	return type;
+}
+
+static inline int xen_gso_type_xen2linux(int type)
+{
+	return type;
+}
+
 #endif
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 00:21 UTC

head link

[Xen-devel] [2/2] [NET] front: Transmit TSO packets if supported

Hi:

[NET] front: Transmit TSO packets if supported

This patch adds TSO transmission support to the frontend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r b5ca6be8ad55 -r 247c57e5b85a
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Thu Jun 29 10:16:22
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Thu Jun 29 10:16:49
2006 +1000
@@ -463,7 +463,7 @@ static int network_open(struct net_devic
 
 static inline int netfront_tx_slot_available(struct netfront_info *np)
 {
-	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1;
+	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2;
 }
 
 static inline void network_maybe_wake_tx(struct net_device *dev)
@@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net
 		rmb(); /* Ensure we see responses up to ''rp''. */
 
 		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
-			id  = RING_GET_RESPONSE(&np->tx, cons)->id;
+			struct netif_tx_response *txrsp;
+
+			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			if (txrsp->status == NETIF_RSP_NULL)
+				continue;
+
+			id  = txrsp->id;
 			skb = np->tx_skbs[id];
 			if (unlikely(gnttab_query_foreign_access(
 				np->grant_tx_ref[id]) != 0)) {
@@ -719,6 +725,7 @@ static int network_start_xmit(struct sk_
 	unsigned short id;
 	struct netfront_info *np = netdev_priv(dev);
 	struct netif_tx_request *tx;
+	struct netif_tx_extra *txtra;
 	char *data = skb->data;
 	RING_IDX i;
 	grant_ref_t ref;
@@ -739,7 +746,8 @@ static int network_start_xmit(struct sk_
 	spin_lock_irq(&np->tx_lock);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)))) {
+		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     netif_needs_gso(dev, skb))) {
 		spin_unlock_irq(&np->tx_lock);
 		goto drop;
 	}
@@ -762,10 +770,31 @@ static int network_start_xmit(struct sk_
 	tx->size = len;
 
 	tx->flags = 0;
+	txtra = NULL;
+
 	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
 	if (skb->proto_data_valid) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
+
+	if (skb_shinfo(skb)->gso_size) {
+		struct netif_tx_extra *gso +			(struct netif_tx_extra
*)RING_GET_REQUEST(&np->tx, ++i);
+
+		if (txtra)
+			txtra->flags |= XEN_NETIF_TXTRA_MORE;
+		else
+			tx->flags |= NETTXF_extra_info;
+
+		gso->gso.size = skb_shinfo(skb)->gso_size;
+		gso->gso.segs = skb_shinfo(skb)->gso_segs;
+		gso->gso.type +			xen_gso_type_linux2xen(skb_shinfo(skb)->gso_type);
+
+		gso->type = XEN_NETIF_TXTRA_TYPE_GSO;
+		gso->flags = 0;
+		txtra = gso;
+	}
 
 	np->tx.req_prod_pvt = i + 1;
 
@@ -1065,9 +1094,26 @@ static int xennet_set_sg(struct net_devi
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int xennet_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		struct netfront_info *np = netdev_priv(dev);
+		int val;
+
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso",
+				 "%d", &val) < 0)
+			val = 0;
+		if (!val)
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
 static void xennet_set_features(struct net_device *dev)
 {
-	xennet_set_sg(dev, 1);
+	if (!xennet_set_sg(dev, 1))
+		xennet_set_tso(dev, 1);
 }
 
 static void network_connect(struct net_device *dev)
@@ -1148,6 +1194,8 @@ static struct ethtool_ops network_ethtoo
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = xennet_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = xennet_set_tso,
 };
 
 #ifdef CONFIG_SYSFS

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 04:44 UTC

head link

[Xen-devel] Re: [1/2] [NET] back: Make extra info slot more generic

On Thu, Jun 29, 2006 at 10:21:23AM +1000, herbert wrote:> 
> [NET] back: Make extra info slot more generic
Here it is again rebased on xen-unstable:

[NET] back: Make extra info slot more generic

Based on suggestions by Keir Fraser, this patch makes the extra slot more
generic by allowing it to be chained and giving each slot a type field.

This makes it easier to add new extra slots such as checksum offset.

I''ve also added GSO type constants specific for Xen.  For now the
conversion
function between them and Linux is a noop.  When and if they do diverge we
can modify them accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r ae245d35457b -r 41464d29a901
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:59:29
2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Thu Jun 29 14:43:14
2006 +1000
@@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b
 	}
 }
 
+int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras,
+		     int work_to_do)
+{
+	struct netif_tx_extra *txtra;
+	int i = netif->tx.req_cons;
+
+	do {
+
+		if (unlikely(work_to_do-- <= 0)) {
+			DPRINTK("Missing extra info\n");
+			return -EBADR;
+		}
+
+		txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx,
+								  i);
+		if (unlikely(!txtra->type ||
+			     txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) {
+			netif->tx.req_cons = ++i;
+			DPRINTK("Invalid extra type: %d\n", txtra->type);
+			return -EINVAL;
+		}
+
+		memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra));
+		netif->tx.req_cons = ++i;
+	} while (txtra->flags & XEN_NETIF_TXTRA_MORE);
+
+	return work_to_do;
+}
+
 /* Called after netfront has transmitted */
 static void net_tx_action(unsigned long unused)
 {
@@ -670,7 +699,7 @@ static void net_tx_action(unsigned long 
 	struct sk_buff *skb;
 	netif_t *netif;
 	netif_tx_request_t txreq;
-	struct netif_tx_extra txtra;
+	struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1];
 	u16 pending_idx;
 	RING_IDX i;
 	gnttab_map_grant_ref_t *mop;
@@ -732,16 +761,15 @@ static void net_tx_action(unsigned long 
 		work_to_do--;
 		netif->tx.req_cons = ++i;
 
+		memset(txtras, 0, sizeof(txtras));
 		if (txreq.flags & NETTXF_extra_info) {
-			if (work_to_do-- <= 0) {
-				DPRINTK("Missing extra info\n");
-				netbk_tx_err(netif, &txreq, i);
+			work_to_do = netbk_get_txtras(netif, txtras,
+						      work_to_do);
+			if (unlikely(work_to_do < 0)) {
+				netbk_tx_err(netif, &txreq, 0);
 				continue;
 			}
-
-			memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i),
-			       sizeof(txtra));
-			netif->tx.req_cons = ++i;
+			i = netif->tx.req_cons;
 		}
 
 		ret = netbk_count_requests(netif, &txreq, work_to_do);
@@ -788,10 +816,14 @@ static void net_tx_action(unsigned long 
 		/* Packets passed to netif_rx() must have some headroom. */
 		skb_reserve(skb, 16);
 
-		if (txreq.flags & NETTXF_gso) {
-			skb_shinfo(skb)->gso_size = txtra.u.gso.size;
-			skb_shinfo(skb)->gso_segs = txtra.u.gso.segs;
-			skb_shinfo(skb)->gso_type = txtra.u.gso.type;
+		if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) {
+			struct netif_tx_extra *gso = txtras - 1 +
+						     XEN_NETIF_TXTRA_TYPE_GSO;
+
+			skb_shinfo(skb)->gso_size = gso->gso.size;
+			skb_shinfo(skb)->gso_segs = gso->gso.segs;
+			skb_shinfo(skb)->gso_type +				xen_gso_type_xen2linux(gso->gso.type);
 		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
diff -r ae245d35457b -r 41464d29a901 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Wed Jun 28 13:59:29 2006 +0100
+++ b/xen/include/public/io/netif.h	Thu Jun 29 14:43:14 2006 +1000
@@ -41,12 +41,28 @@
 #define _NETTXF_more_data      (2)
 #define  NETTXF_more_data      (1U<<_NETTXF_more_data)
 
-/* Packet has GSO fields in the following descriptor (netif_tx_extra.u.gso). */
-#define _NETTXF_gso            (3)
-#define  NETTXF_gso            (1U<<_NETTXF_gso)
+/* Packet to be folloed by extra descritptor. */
+#define _NETTXF_extra_info     (3)
+#define  NETTXF_extra_info     (1U<<_NETTXF_extra_info)
 
-/* This descriptor is followed by an extra-info descriptor (netif_tx_extra). */
-#define  NETTXF_extra_info     (NETTXF_gso)
+enum {
+    XEN_NETIF_TXTRA_TYPE_GSO = 1,
+    XEN_NETIF_TXTRA_TYPE_MAX,
+};
+
+enum {
+    XEN_NETIF_TXTRA_MORE = 1 << 0,
+};
+
+enum {
+    /* TCP over IPv4. */
+    XEN_GSO_TCPV4 = 1 << 0,
+    /* UDP over IPv4. */
+    XEN_GSO_UDPV4 = 1 << 1,
+
+    /* Packet header must be verified. */
+    XEN_GSO_DODGY = 1 << 2,
+};
 
 struct netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
@@ -57,16 +73,35 @@ struct netif_tx_request {
 };
 typedef struct netif_tx_request netif_tx_request_t;
 
-/* This structure needs to fit within netif_tx_request for compatibility. */
+/*
+ * This structure needs to fit within both netif_tx_request and
+ * netif_rx_response for compatibility.
+ */
 struct netif_tx_extra {
+    /* Type of extra info. */
+    uint8_t type;
+    /* Flags for this info. */
+    uint8_t flags;
+
     union {
-        /* NETTXF_gso: Generic Segmentation Offload. */
-        struct netif_tx_gso {
-            uint16_t size;	   /* GSO MSS. */
-            uint16_t segs;	   /* GSO segment count. */
-            uint16_t type;	   /* GSO type. */
-        } gso;
-    } u;
+	/*
+	 * Maximum payload size of each segment.  For example, for TCP this is
+	 * just the path MSS.
+	 */
+	uint16_t size;
+
+	/*
+	 * Number of GSO segments.  This is the number of segments that have to
+	 * be generated for this packet given the MSS.
+	 */
+	uint16_t segs;
+
+	/*
+	 * GSO type.  This determines the protocol of the packet and any extra
+	 * features required to segment the packet properly.
+	 */
+	uint16_t type;
+    } gso;
 };
 
 struct netif_tx_response {
@@ -110,6 +145,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif
 /* No response: used for auxiliary requests (e.g., netif_tx_extra). */
 #define NETIF_RSP_NULL             1
 
+/* For now the GSO types are identical. */
+static inline int xen_gso_type_linux2xen(int type)
+{
+	return type;
+}
+
+static inline int xen_gso_type_xen2linux(int type)
+{
+	return type;
+}
+
 #endif
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-29 09:02 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 29 Jun 2006, at 01:20, Herbert Xu wrote:
>> Another question: why are gso_size and gso_segs both required? Surely
>> those, plus the overall request size, are redundant. e.g.,
shouldn''t
>> gso_segs = tot_size / gso_size (rounded up)?
>
> For TSO, gso_segs can be easily determined from the packet and 
> gso_size.
> However, for GSO, we don''t know the packet header length so the
same is
> not true.
Each segment will need a header though, I''d imagine, so whoever does 
the segmentation needs to know the packet header length? Maybe I''m just
confused. :-) Could you briefly explain what the inter-domain data 
format would be (e.g., is there a header, etc.), and gso_{size,segs}, 
for some arbitrary IP-encapsulated protocol? And how that information 
would be used to perform segmentation in the backend domain? Is the 
segmentation algorithm any different at all when the protocol is 
specifically TCPv4? I''d like to add some documentation of all this to 
netif.h when I have it clear in my head.

The patches you sent look fine. I''d really like to understand this 
stuff before turning it on though.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 09:40 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Thu, Jun 29, 2006 at 10:02:20AM +0100, Keir Fraser
wrote:>
> >For TSO, gso_segs can be easily determined from the packet and 
> >gso_size.
> >However, for GSO, we don''t know the packet header length so
the same is
> >not true.
> 
> Each segment will need a header though, I''d imagine, so whoever
does
> the segmentation needs to know the packet header length? Maybe I''m
just
The code or chip that actually does the segmentation will obviously
know the header length.  However, netback or netfront does not.

Since Linux requires gso_segs to be set (it''s used to figure out how
many packets there are going to be before segmentation takes place),
we really need to set it at the source where the packet is produced.

For another OS that only supports TSO, they would simply have to set
gso_segs in the frontend driver.
> confused. :-) Could you briefly explain what the inter-domain data 
> format would be (e.g., is there a header, etc.), and gso_{size,segs}, 
> for some arbitrary IP-encapsulated protocol? And how that information 
GSO packets are simply given as one continuous chunk of data with a
gso_type that determines the protocol (e.g., TCPv4 or UDPv4) and a
set of features that the packet requires (e.g., header verification or
ECN for TCPv4).

The parameter gso_size is required to perform the actual segmentation.

The parameter gso_segs is used by bits that sit in front of the actual
segmentation to figure out how many segments will be produced.
> would be used to perform segmentation in the backend domain? Is the 
> segmentation algorithm any different at all when the protocol is 
> specifically TCPv4? I''d like to add some documentation of all this
to
> netif.h when I have it clear in my head.
How segmentation is performed is protocol-specific.  In general, all
headers are duplicated and modified for each segment.  The modification
is protocol-specific.  For TCP it involves clearing header flags depending
on whether it''s the first, a middle or the last packet, changing the
sequence number and checksum.

You can have a look at the tcp_segment in net/ipv4/tcp.c.  The only catch
is that most of the hardware out there can''t deal with ECN so by
default
we turn that off (that''ll actually change soon now that GSO is here).

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-29 10:32 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 29 Jun 2006, at 10:40, Herbert Xu wrote:
> On Thu, Jun 29, 2006 at 10:02:20AM +0100, Keir Fraser wrote:
>>
>>> For TSO, gso_segs can be easily determined from the packet and
>>> gso_size.
>>> However, for GSO, we don''t know the packet header length
so the same
>>> is
>>> not true.
>>
>> Each segment will need a header though, I''d imagine, so
whoever does
>> the segmentation needs to know the packet header length? Maybe
I''m
>> just
>
> The code or chip that actually does the segmentation will obviously
> know the header length.  However, netback or netfront does not.
>
> Since Linux requires gso_segs to be set (it''s used to figure out
how
> many packets there are going to be before segmentation takes place),
> we really need to set it at the source where the packet is produced.
>
> For another OS that only supports TSO, they would simply have to set
> gso_segs in the frontend driver.
What if gso_segs, or any other gso parameter (e.g., type), is set 
incorrectly? The backend cannot trust frontend clients, so I''m rather 
worried that we could make the backend network stack crash!

Also we should probably only define a XEN_GSO_TCPV4 flag for now, since 
we support only plain TSO right now. No skbuffs should get passed to 
netfront that have other GSO flags set, right? Seince we only advertise 
NETIF_F_TSO.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 12:46 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Thu, Jun 29, 2006 at 11:32:43AM +0100, Keir Fraser
wrote:> 
> What if gso_segs, or any other gso parameter (e.g., type), is set 
> incorrectly? The backend cannot trust frontend clients, so I''m
rather
> worried that we could make the backend network stack crash!
gso_type is a key so the OS (Linux) must deal with all possible values.
Any packet with a protocol or feature bit that it does not recognise
will be dropped.

However, you have a very good point regarding gso_segs.  I''ll change it
so that it is always recalculated for SKB_GSO_DODGY packets.
> Also we should probably only define a XEN_GSO_TCPV4 flag for now, since 
> we support only plain TSO right now. No skbuffs should get passed to 
> netfront that have other GSO flags set, right? Seince we only advertise 
> NETIF_F_TSO.
While it shouldn''t have any adverse effects for the reason above, I
will
add a mask so that only bits in the mask are allowed through to gso_type.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-29 13:20 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 29 Jun 2006, at 13:46, Herbert Xu wrote:
>> What if gso_segs, or any other gso parameter (e.g., type), is set
>> incorrectly? The backend cannot trust frontend clients, so I''m
rather
>> worried that we could make the backend network stack crash!
>
> gso_type is a key so the OS (Linux) must deal with all possible values.
> Any packet with a protocol or feature bit that it does not recognise
> will be dropped.
>
> However, you have a very good point regarding gso_segs.  I''ll
change it
> so that it is always recalculated for SKB_GSO_DODGY packets.
If you can recalculate it, why is the field required at all (both in 
the skbuff and in the netif_tx_extra)? You previously said it was 
needed because it can''t be recalculated (until you get to the 
segmentation code, which may be very late in packet processing).

We could calculate gso_segs for TCPv4 in netback, which is all we care 
about right now.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 13:26 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Thu, Jun 29, 2006 at 02:20:21PM +0100, Keir Fraser
wrote:> 
> If you can recalculate it, why is the field required at all (both in 
> the skbuff and in the netif_tx_extra)? You previously said it was 
> needed because it can''t be recalculated (until you get to the 
> segmentation code, which may be very late in packet processing).
It can''t be recalculated in general (i.e., in netback).  However,
within
each protocol module it can be recalculated.
> We could calculate gso_segs for TCPv4 in netback, which is all we care 
> about right now.
I''d rather not add protocol-specific knowledge to netback.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-29 13:41 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On 29 Jun 2006, at 14:26, Herbert Xu wrote:
>> If you can recalculate it, why is the field required at all (both in
>> the skbuff and in the netif_tx_extra)? You previously said it was
>> needed because it can''t be recalculated (until you get to the
>> segmentation code, which may be very late in packet processing).
>
> It can''t be recalculated in general (i.e., in netback).  However, 
> within
> each protocol module it can be recalculated.
In that case we don''t need gso_segs in the netif_tx_extra? That, plus 
limit gso_type, would make me much happier.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-29 23:08 UTC

head link

Re: [Xen-devel] [0/5] [NET]: Add TSO support

On Thu, Jun 29, 2006 at 02:41:50PM +0100, Keir Fraser
wrote:> 
> In that case we don''t need gso_segs in the netif_tx_extra? That,
plus
> limit gso_type, would make me much happier.
That''s the plan!

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 02:40 UTC

head link

[Xen-devel] [1/4] [NET] back: Fix maximum fragment check

Hi:

Here we go, these are all based on xen-unstable so please disregard all
previous unmerged patches from me.

[NET] back: Fix maximum fragment check

The maximum fragment check from the frontend is off by one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r ae245d35457b -r 89ffce2cc120
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:59:29
2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jun 30 12:13:20
2006 +1000
@@ -751,7 +751,7 @@ static void net_tx_action(unsigned long 
 		}
 		i += ret;
 
-		if (unlikely(ret > MAX_SKB_FRAGS + 1)) {
+		if (unlikely(ret > MAX_SKB_FRAGS)) {
 			DPRINTK("Too many frags\n");
 			netbk_tx_err(netif, &txreq, i);
 			continue;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 02:41 UTC

head link

[Xen-devel] [2/4] [NET]: Add net-tso.patch

Hi:

[NET]: Add net-tso.patch

This patch has been submitted upstream for review.  It resets gso_segs for
TSO.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 89ffce2cc120 -r 752477acd893 patches/linux-2.6.16.13/net-tso.patch
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/patches/linux-2.6.16.13/net-tso.patch	Fri Jun 30 12:13:27 2006 +1000
@@ -0,0 +1,28 @@
+diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
+index 0336422..0bb0ac9 100644
+--- a/net/ipv4/tcp.c
++++ b/net/ipv4/tcp.c
+@@ -2166,13 +2166,19 @@ struct sk_buff *tcp_tso_segment(struct s
+ 	if (!pskb_may_pull(skb, thlen))
+ 		goto out;
+ 
+-	segs = NULL;
+-	if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST))
+-		goto out;
+-
+ 	oldlen = (u16)~skb->len;
+ 	__skb_pull(skb, thlen);
+ 
++	if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
++		/* Packet is from an untrusted source, reset gso_segs. */
++		int mss = skb_shinfo(skb)->gso_size;
++
++		skb_shinfo(skb)->gso_segs = (skb->len + mss - 1) / mss;
++
++		segs = NULL;
++		goto out;
++	}
++
+ 	segs = skb_segment(skb, features);
+ 	if (IS_ERR(segs))
+ 		goto out;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 02:42 UTC

head link

[Xen-devel] [3/4] [NET] back: Make extra info slot more generic

Hi:

[NET] back: Make extra info slot more generic

Based on suggestions by Keir Fraser, this patch makes the extra slot more
generic by allowing it to be chained and giving each slot a type field.

This makes it easier to add new extra slots such as checksum offset.

I''ve also added GSO type constants specific for Xen.  For now the
conversion
function between them and Linux is a noop.  When and if they do diverge we
can modify them accordingly.

This patch also gets rid of gso_segs which is now always recomputed for
SKB_GSO_DODGY packets.

Last but not least netback now actually sets SKB_GSO_DODGY.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 752477acd893 -r bd0c0b635dc1
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jun 30 12:13:27
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jun 30 12:36:57
2006 +1000
@@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b
 	}
 }
 
+int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras,
+		     int work_to_do)
+{
+	struct netif_tx_extra *txtra;
+	int i = netif->tx.req_cons;
+
+	do {
+
+		if (unlikely(work_to_do-- <= 0)) {
+			DPRINTK("Missing extra info\n");
+			return -EBADR;
+		}
+
+		txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx,
+								  i);
+		if (unlikely(!txtra->type ||
+			     txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) {
+			netif->tx.req_cons = ++i;
+			DPRINTK("Invalid extra type: %d\n", txtra->type);
+			return -EINVAL;
+		}
+
+		memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra));
+		netif->tx.req_cons = ++i;
+	} while (txtra->flags & XEN_NETIF_TXTRA_MORE);
+
+	return work_to_do;
+}
+
 /* Called after netfront has transmitted */
 static void net_tx_action(unsigned long unused)
 {
@@ -670,7 +699,8 @@ static void net_tx_action(unsigned long 
 	struct sk_buff *skb;
 	netif_t *netif;
 	netif_tx_request_t txreq;
-	struct netif_tx_extra txtra;
+	struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1];
+	struct netif_tx_extra *gso;
 	u16 pending_idx;
 	RING_IDX i;
 	gnttab_map_grant_ref_t *mop;
@@ -732,16 +762,15 @@ static void net_tx_action(unsigned long 
 		work_to_do--;
 		netif->tx.req_cons = ++i;
 
+		memset(txtras, 0, sizeof(txtras));
 		if (txreq.flags & NETTXF_extra_info) {
-			if (work_to_do-- <= 0) {
-				DPRINTK("Missing extra info\n");
-				netbk_tx_err(netif, &txreq, i);
+			work_to_do = netbk_get_txtras(netif, txtras,
+						      work_to_do);
+			if (unlikely(work_to_do < 0)) {
+				netbk_tx_err(netif, &txreq, 0);
 				continue;
 			}
-
-			memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i),
-			       sizeof(txtra));
-			netif->tx.req_cons = ++i;
+			i = netif->tx.req_cons;
 		}
 
 		ret = netbk_count_requests(netif, &txreq, work_to_do);
@@ -772,6 +801,32 @@ static void net_tx_action(unsigned long 
 			continue;
 		}
 
+		gso = NULL;
+		if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) {
+			/* In future this will be retrieved from Linux. */
+			unsigned int allowed_types = SKB_GSO_TCPV4 |
+						     SKB_GSO_DODGY;
+			unsigned type;
+
+			gso = &txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1];
+			if (!gso->gso.size) {
+				DPRINTK("GSO size must not be zero\n");
+				netbk_tx_err(netif, &txreq, i);
+				continue; 
+			}
+
+			type = xen_gso_type_xen2linux(gso->gso.type);
+			if (!type || (type & allowed_types) != type) {
+				DPRINTK("Bogus GSO type: 0x%x\n",
+					gso->gso.type);
+				netbk_tx_err(netif, &txreq, i);
+				continue; 
+			}
+
+			/* Whoever gets this needs to verify the header. */
+			gso->gso.type = type | SKB_GSO_DODGY;
+		}
+
 		pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
@@ -788,10 +843,9 @@ static void net_tx_action(unsigned long 
 		/* Packets passed to netif_rx() must have some headroom. */
 		skb_reserve(skb, 16);
 
-		if (txreq.flags & NETTXF_gso) {
-			skb_shinfo(skb)->gso_size = txtra.u.gso.size;
-			skb_shinfo(skb)->gso_segs = txtra.u.gso.segs;
-			skb_shinfo(skb)->gso_type = txtra.u.gso.type;
+		if (gso) {
+			skb_shinfo(skb)->gso_size = gso->gso.size;
+			skb_shinfo(skb)->gso_type = gso->gso.type;
 		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
diff -r 752477acd893 -r bd0c0b635dc1 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Fri Jun 30 12:13:27 2006 +1000
+++ b/xen/include/public/io/netif.h	Fri Jun 30 12:36:57 2006 +1000
@@ -41,12 +41,28 @@
 #define _NETTXF_more_data      (2)
 #define  NETTXF_more_data      (1U<<_NETTXF_more_data)
 
-/* Packet has GSO fields in the following descriptor (netif_tx_extra.u.gso). */
-#define _NETTXF_gso            (3)
-#define  NETTXF_gso            (1U<<_NETTXF_gso)
+/* Packet to be folloed by extra descritptor. */
+#define _NETTXF_extra_info     (3)
+#define  NETTXF_extra_info     (1U<<_NETTXF_extra_info)
 
-/* This descriptor is followed by an extra-info descriptor (netif_tx_extra). */
-#define  NETTXF_extra_info     (NETTXF_gso)
+enum {
+    XEN_NETIF_TXTRA_TYPE_GSO = 1,
+    XEN_NETIF_TXTRA_TYPE_MAX,
+};
+
+enum {
+    XEN_NETIF_TXTRA_MORE = 1 << 0,
+};
+
+enum {
+    /* TCP over IPv4. */
+    XEN_GSO_TCPV4 = 1 << 0,
+    /* UDP over IPv4. */
+    XEN_GSO_UDPV4 = 1 << 1,
+
+    /* Packet header must be verified. */
+    XEN_GSO_DODGY = 1 << 2,
+};
 
 struct netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
@@ -57,16 +73,29 @@ struct netif_tx_request {
 };
 typedef struct netif_tx_request netif_tx_request_t;
 
-/* This structure needs to fit within netif_tx_request for compatibility. */
+/*
+ * This structure needs to fit within both netif_tx_request and
+ * netif_rx_response for compatibility.
+ */
 struct netif_tx_extra {
+    /* Type of extra info. */
+    uint8_t type;
+    /* Flags for this info. */
+    uint8_t flags;
+
     union {
-        /* NETTXF_gso: Generic Segmentation Offload. */
-        struct netif_tx_gso {
-            uint16_t size;	   /* GSO MSS. */
-            uint16_t segs;	   /* GSO segment count. */
-            uint16_t type;	   /* GSO type. */
-        } gso;
-    } u;
+	/*
+	 * Maximum payload size of each segment.  For example, for TCP this is
+	 * just the path MSS.
+	 */
+	uint16_t size;
+
+	/*
+	 * GSO type.  This determines the protocol of the packet and any extra
+	 * features required to segment the packet properly.
+	 */
+	uint16_t type;
+    } gso;
 };
 
 struct netif_tx_response {
@@ -110,6 +139,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif
 /* No response: used for auxiliary requests (e.g., netif_tx_extra). */
 #define NETIF_RSP_NULL             1
 
+/* For now the GSO types are identical. */
+static inline int xen_gso_type_linux2xen(int type)
+{
+	return type;
+}
+
+static inline int xen_gso_type_xen2linux(int type)
+{
+	return type;
+}
+
 #endif
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 02:43 UTC

head link

[Xen-devel] [4/4] [NET] front: Transmit TSO packets if supported

Hi:

[NET] front: Transmit TSO packets if supported

This patch adds TSO transmission support to the frontend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r bd0c0b635dc1 -r bef11f078479
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jun 30 12:36:57
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jun 30 12:38:18
2006 +1000
@@ -463,7 +463,7 @@ static int network_open(struct net_devic
 
 static inline int netfront_tx_slot_available(struct netfront_info *np)
 {
-	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1;
+	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2;
 }
 
 static inline void network_maybe_wake_tx(struct net_device *dev)
@@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net
 		rmb(); /* Ensure we see responses up to ''rp''. */
 
 		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
-			id  = RING_GET_RESPONSE(&np->tx, cons)->id;
+			struct netif_tx_response *txrsp;
+
+			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			if (txrsp->status == NETIF_RSP_NULL)
+				continue;
+
+			id  = txrsp->id;
 			skb = np->tx_skbs[id];
 			if (unlikely(gnttab_query_foreign_access(
 				np->grant_tx_ref[id]) != 0)) {
@@ -719,6 +725,7 @@ static int network_start_xmit(struct sk_
 	unsigned short id;
 	struct netfront_info *np = netdev_priv(dev);
 	struct netif_tx_request *tx;
+	struct netif_tx_extra *txtra;
 	char *data = skb->data;
 	RING_IDX i;
 	grant_ref_t ref;
@@ -739,7 +746,8 @@ static int network_start_xmit(struct sk_
 	spin_lock_irq(&np->tx_lock);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)))) {
+		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     netif_needs_gso(dev, skb))) {
 		spin_unlock_irq(&np->tx_lock);
 		goto drop;
 	}
@@ -762,10 +770,30 @@ static int network_start_xmit(struct sk_
 	tx->size = len;
 
 	tx->flags = 0;
+	txtra = NULL;
+
 	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
 	if (skb->proto_data_valid) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
+
+	if (skb_shinfo(skb)->gso_size) {
+		struct netif_tx_extra *gso +			(struct netif_tx_extra
*)RING_GET_REQUEST(&np->tx, ++i);
+
+		if (txtra)
+			txtra->flags |= XEN_NETIF_TXTRA_MORE;
+		else
+			tx->flags |= NETTXF_extra_info;
+
+		gso->gso.size = skb_shinfo(skb)->gso_size;
+		gso->gso.type +			xen_gso_type_linux2xen(skb_shinfo(skb)->gso_type);
+
+		gso->type = XEN_NETIF_TXTRA_TYPE_GSO;
+		gso->flags = 0;
+		txtra = gso;
+	}
 
 	np->tx.req_prod_pvt = i + 1;
 
@@ -1065,9 +1093,26 @@ static int xennet_set_sg(struct net_devi
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int xennet_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		struct netfront_info *np = netdev_priv(dev);
+		int val;
+
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso",
+				 "%d", &val) < 0)
+			val = 0;
+		if (!val)
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
 static void xennet_set_features(struct net_device *dev)
 {
-	xennet_set_sg(dev, 1);
+	if (!xennet_set_sg(dev, 1))
+		xennet_set_tso(dev, 1);
 }
 
 static void network_connect(struct net_device *dev)
@@ -1148,6 +1193,8 @@ static struct ethtool_ops network_ethtoo
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = xennet_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = xennet_set_tso,
 };
 
 #ifdef CONFIG_SYSFS

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 12:47 UTC

head link

[Xen-devel] [1/4] [NET] back: Fix maximum fragment check

Hi:

On Fri, Jun 30, 2006 at 12:40:47PM +1000, herbert wrote:> 
> [NET] back: Fix maximum fragment check
The net-tso patch has been merged upstream.  I''ve also changed the
feature-tso interface to be a bit mask of the XEN gso_types bits.
It''s now called feature-gso.  This means we won''t have to add
one
feature for each protocol.

So here is a repost of the entire series.

[NET] back: Fix maximum fragment check

The maximum fragment check from the frontend is off by one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r ae245d35457b -r 617e4d3351f3
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Wed Jun 28 13:59:29
2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jun 30 22:12:59
2006 +1000
@@ -751,7 +751,7 @@ static void net_tx_action(unsigned long 
 		}
 		i += ret;
 
-		if (unlikely(ret > MAX_SKB_FRAGS + 1)) {
+		if (unlikely(ret > MAX_SKB_FRAGS)) {
 			DPRINTK("Too many frags\n");
 			netbk_tx_err(netif, &txreq, i);
 			continue;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 12:47 UTC

head link

[Xen-devel] [2/4] [NET]: Update net-gso.patch

Hi:

[NET]: Update net-gso.patch

New changeset merged upstream:

    [TCP]: Reset gso_segs if packet is dodgy

    I wasn''t paranoid enough in verifying GSO information.  A bogus
gso_segs
    could upset drivers as much as a bogus header would.  Let''s reset
it in
    the per-protocol gso_segment functions.

    I didn''t verify gso_size because that can be verified by the source
of
    the dodgy packets.

    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 617e4d3351f3 -r f6806ad757d5 patches/linux-2.6.16.13/net-gso.patch
--- a/patches/linux-2.6.16.13/net-gso.patch	Fri Jun 30 22:12:59 2006 +1000
+++ b/patches/linux-2.6.16.13/net-gso.patch	Fri Jun 30 22:16:02 2006 +1000
@@ -2225,7 +2225,7 @@ index d64e2ec..7494823 100644
  	err = ipcomp_compress(x, skb);
  	iph = skb->nh.iph;
 diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
-index 00aa80e..84130c9 100644
+index 00aa80e..30c81a8 100644
 --- a/net/ipv4/tcp.c
 +++ b/net/ipv4/tcp.c
 @@ -257,6 +257,7 @@ #include <linux/smp_lock.h>
@@ -2281,7 +2281,7 @@ index 00aa80e..84130c9 100644
  
  			from += copy;
  			copied += copy;
-@@ -2026,6 +2021,71 @@ int tcp_getsockopt(struct sock *sk, int 
+@@ -2026,6 +2021,77 @@ int tcp_getsockopt(struct sock *sk, int 
  }
  
  
@@ -2306,12 +2306,18 @@ index 00aa80e..84130c9 100644
 +	if (!pskb_may_pull(skb, thlen))
 +		goto out;
 +
-+	segs = NULL;
-+	if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST))
-+		goto out;
-+
 +	oldlen = (u16)~skb->len;
 +	__skb_pull(skb, thlen);
++
++	if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
++		/* Packet is from an untrusted source, reset gso_segs. */
++		int mss = skb_shinfo(skb)->gso_size;
++
++		skb_shinfo(skb)->gso_segs = (skb->len + mss - 1) / mss;
++
++		segs = NULL;
++		goto out;
++	}
 +
 +	segs = skb_segment(skb, features);
 +	if (IS_ERR(segs))

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 12:48 UTC

head link

[Xen-devel] [3/4] [NET] back: Make extra info slot more generic

Hi:

[NET] back: Make extra info slot more generic

Based on suggestions by Keir Fraser, this patch makes the extra slot more
generic by allowing it to be chained and giving each slot a type field.

This makes it easier to add new extra slots such as checksum offset.

I''ve also added GSO type constants specific for Xen.  For now the
conversion
function between them and Linux is a noop.  When and if they do diverge we
can modify them accordingly.

The types are now checked in the backend to ensure that they only contain
bits that we advertised.  We will also advertise the bits as feature-gso
instead of feature-tso.

This patch also gets rid of gso_segs which is now always recomputed for
SKB_GSO_DODGY packets.

Last but not least netback now actually sets SKB_GSO_DODGY.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r f6806ad757d5 -r fb922826baef
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jun 30 22:16:02 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jun 30 22:28:09 2006
+1000
@@ -88,8 +88,13 @@ typedef struct netif_st {
 	/* Miscellaneous private stuff. */
 	enum { DISCONNECTED, DISCONNECTING, CONNECTED } status;
 	int active;
+
 	struct list_head list;  /* scheduling list */
+
 	atomic_t         refcnt;
+	/* Bit mask of allowed GSO types. */
+	int		 gso_types;
+
 	struct net_device *dev;
 	struct net_device_stats stats;
 
diff -r f6806ad757d5 -r fb922826baef
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jun 30 22:16:02
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jun 30 22:28:09
2006 +1000
@@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b
 	}
 }
 
+int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras,
+		     int work_to_do)
+{
+	struct netif_tx_extra *txtra;
+	int i = netif->tx.req_cons;
+
+	do {
+
+		if (unlikely(work_to_do-- <= 0)) {
+			DPRINTK("Missing extra info\n");
+			return -EBADR;
+		}
+
+		txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx,
+								  i);
+		if (unlikely(!txtra->type ||
+			     txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) {
+			netif->tx.req_cons = ++i;
+			DPRINTK("Invalid extra type: %d\n", txtra->type);
+			return -EINVAL;
+		}
+
+		memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra));
+		netif->tx.req_cons = ++i;
+	} while (txtra->flags & XEN_NETIF_TXTRA_MORE);
+
+	return work_to_do;
+}
+
 /* Called after netfront has transmitted */
 static void net_tx_action(unsigned long unused)
 {
@@ -670,7 +699,8 @@ static void net_tx_action(unsigned long 
 	struct sk_buff *skb;
 	netif_t *netif;
 	netif_tx_request_t txreq;
-	struct netif_tx_extra txtra;
+	struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1];
+	struct netif_tx_extra *gso;
 	u16 pending_idx;
 	RING_IDX i;
 	gnttab_map_grant_ref_t *mop;
@@ -732,16 +762,15 @@ static void net_tx_action(unsigned long 
 		work_to_do--;
 		netif->tx.req_cons = ++i;
 
+		memset(txtras, 0, sizeof(txtras));
 		if (txreq.flags & NETTXF_extra_info) {
-			if (work_to_do-- <= 0) {
-				DPRINTK("Missing extra info\n");
-				netbk_tx_err(netif, &txreq, i);
+			work_to_do = netbk_get_txtras(netif, txtras,
+						      work_to_do);
+			if (unlikely(work_to_do < 0)) {
+				netbk_tx_err(netif, &txreq, 0);
 				continue;
 			}
-
-			memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i),
-			       sizeof(txtra));
-			netif->tx.req_cons = ++i;
+			i = netif->tx.req_cons;
 		}
 
 		ret = netbk_count_requests(netif, &txreq, work_to_do);
@@ -772,6 +801,29 @@ static void net_tx_action(unsigned long 
 			continue;
 		}
 
+		gso = NULL;
+		if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) {
+			unsigned type;
+
+			gso = &txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1];
+			if (!gso->gso.size) {
+				DPRINTK("GSO size must not be zero\n");
+				netbk_tx_err(netif, &txreq, i);
+				continue; 
+			}
+
+			type = xen_gso_type_xen2linux(gso->gso.type);
+			if (!type || (type & netif->gso_types) != type) {
+				DPRINTK("Bogus GSO type: 0x%x\n",
+					gso->gso.type);
+				netbk_tx_err(netif, &txreq, i);
+				continue; 
+			}
+
+			/* Whoever gets this needs to verify the header. */
+			gso->gso.type = type | SKB_GSO_DODGY;
+		}
+
 		pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
@@ -788,10 +840,9 @@ static void net_tx_action(unsigned long 
 		/* Packets passed to netif_rx() must have some headroom. */
 		skb_reserve(skb, 16);
 
-		if (txreq.flags & NETTXF_gso) {
-			skb_shinfo(skb)->gso_size = txtra.u.gso.size;
-			skb_shinfo(skb)->gso_segs = txtra.u.gso.segs;
-			skb_shinfo(skb)->gso_type = txtra.u.gso.type;
+		if (gso) {
+			skb_shinfo(skb)->gso_size = gso->gso.size;
+			skb_shinfo(skb)->gso_type = gso->gso.type;
 		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
diff -r f6806ad757d5 -r fb922826baef
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jun 30 22:16:02 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jun 30 22:28:09 2006
+1000
@@ -34,6 +34,9 @@ struct backend_info
 	netif_t *netif;
 	struct xenbus_watch backend_watch;
 	enum xenbus_state frontend_state;
+
+	/* Bit mask of allowed GSO types. */
+	int gso_types;
 };
 
 static int connect_rings(struct backend_info *);
@@ -83,6 +86,9 @@ static int netback_probe(struct xenbus_d
 	be->dev = dev;
 	dev->dev.driver_data = be;
 
+	/* In future this will be retrieved from Linux. */
+	be->gso_types = SKB_GSO_TCPV4 | SKB_GSO_DODGY;
+
 	err = xenbus_watch_path2(dev, dev->nodename, "handle",
 				 &be->backend_watch, backend_changed);
 	if (err)
@@ -101,9 +107,10 @@ static int netback_probe(struct xenbus_d
 			goto abort_transaction;
 		}
 
-		err = xenbus_printf(xbt, dev->nodename, "feature-tso",
"%d", 1);
+		err = xenbus_printf(xbt, dev->nodename, "feature-gso",
"%d",
+				    be->gso_types);
 		if (err) {
-			message = "writing feature-tso";
+			message = "writing feature-gso";
 			goto abort_transaction;
 		}
 
@@ -205,6 +212,8 @@ static void backend_changed(struct xenbu
 			xenbus_dev_fatal(dev, err, "creating interface");
 			return;
 		}
+
+		be->netif->gso_types = be->gso_types;
 
 		kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE);
 
diff -r f6806ad757d5 -r fb922826baef xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Fri Jun 30 22:16:02 2006 +1000
+++ b/xen/include/public/io/netif.h	Fri Jun 30 22:28:09 2006 +1000
@@ -41,12 +41,28 @@
 #define _NETTXF_more_data      (2)
 #define  NETTXF_more_data      (1U<<_NETTXF_more_data)
 
-/* Packet has GSO fields in the following descriptor (netif_tx_extra.u.gso). */
-#define _NETTXF_gso            (3)
-#define  NETTXF_gso            (1U<<_NETTXF_gso)
+/* Packet to be folloed by extra descritptor. */
+#define _NETTXF_extra_info     (3)
+#define  NETTXF_extra_info     (1U<<_NETTXF_extra_info)
 
-/* This descriptor is followed by an extra-info descriptor (netif_tx_extra). */
-#define  NETTXF_extra_info     (NETTXF_gso)
+enum {
+    XEN_NETIF_TXTRA_TYPE_GSO = 1,
+    XEN_NETIF_TXTRA_TYPE_MAX,
+};
+
+enum {
+    XEN_NETIF_TXTRA_MORE = 1 << 0,
+};
+
+enum {
+    /* TCP over IPv4. */
+    XEN_GSO_TCPV4 = 1 << 0,
+    /* UDP over IPv4. */
+    XEN_GSO_UDPV4 = 1 << 1,
+
+    /* Packet header must be verified. */
+    XEN_GSO_DODGY = 1 << 2,
+};
 
 struct netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
@@ -57,16 +73,29 @@ struct netif_tx_request {
 };
 typedef struct netif_tx_request netif_tx_request_t;
 
-/* This structure needs to fit within netif_tx_request for compatibility. */
+/*
+ * This structure needs to fit within both netif_tx_request and
+ * netif_rx_response for compatibility.
+ */
 struct netif_tx_extra {
+    /* Type of extra info. */
+    uint8_t type;
+    /* Flags for this info. */
+    uint8_t flags;
+
     union {
-        /* NETTXF_gso: Generic Segmentation Offload. */
-        struct netif_tx_gso {
-            uint16_t size;	   /* GSO MSS. */
-            uint16_t segs;	   /* GSO segment count. */
-            uint16_t type;	   /* GSO type. */
-        } gso;
-    } u;
+	/*
+	 * Maximum payload size of each segment.  For example, for TCP this is
+	 * just the path MSS.
+	 */
+	uint16_t size;
+
+	/*
+	 * GSO type.  This determines the protocol of the packet and any extra
+	 * features required to segment the packet properly.
+	 */
+	uint16_t type;
+    } gso;
 };
 
 struct netif_tx_response {
@@ -110,6 +139,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif
 /* No response: used for auxiliary requests (e.g., netif_tx_extra). */
 #define NETIF_RSP_NULL             1
 
+/* For now the GSO types are identical. */
+static inline int xen_gso_type_linux2xen(int type)
+{
+	return type;
+}
+
+static inline int xen_gso_type_xen2linux(int type)
+{
+	return type;
+}
+
 #endif
 
 /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 12:49 UTC

head link

[Xen-devel] [4/4] [NET] front: Transmit TSO packets if supported

Hi:

[NET] front: Transmit TSO packets if supported

This patch adds TSO transmission support to the frontend.  This also fixes
a bug where SG may not be turned off correctly when migrating to an old
host.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r fb922826baef -r ea33ffdb9973
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jun 30 22:28:09
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jun 30 22:44:18
2006 +1000
@@ -463,7 +463,7 @@ static int network_open(struct net_devic
 
 static inline int netfront_tx_slot_available(struct netfront_info *np)
 {
-	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1;
+	return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2;
 }
 
 static inline void network_maybe_wake_tx(struct net_device *dev)
@@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net
 		rmb(); /* Ensure we see responses up to ''rp''. */
 
 		for (cons = np->tx.rsp_cons; cons != prod; cons++) {
-			id  = RING_GET_RESPONSE(&np->tx, cons)->id;
+			struct netif_tx_response *txrsp;
+
+			txrsp = RING_GET_RESPONSE(&np->tx, cons);
+			if (txrsp->status == NETIF_RSP_NULL)
+				continue;
+
+			id  = txrsp->id;
 			skb = np->tx_skbs[id];
 			if (unlikely(gnttab_query_foreign_access(
 				np->grant_tx_ref[id]) != 0)) {
@@ -719,6 +725,7 @@ static int network_start_xmit(struct sk_
 	unsigned short id;
 	struct netfront_info *np = netdev_priv(dev);
 	struct netif_tx_request *tx;
+	struct netif_tx_extra *txtra;
 	char *data = skb->data;
 	RING_IDX i;
 	grant_ref_t ref;
@@ -739,7 +746,8 @@ static int network_start_xmit(struct sk_
 	spin_lock_irq(&np->tx_lock);
 
 	if (unlikely(!netif_carrier_ok(dev) ||
-		     (frags > 1 && !xennet_can_sg(dev)))) {
+		     (frags > 1 && !xennet_can_sg(dev)) ||
+		     netif_needs_gso(dev, skb))) {
 		spin_unlock_irq(&np->tx_lock);
 		goto drop;
 	}
@@ -762,10 +770,30 @@ static int network_start_xmit(struct sk_
 	tx->size = len;
 
 	tx->flags = 0;
+	txtra = NULL;
+
 	if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 		tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
 	if (skb->proto_data_valid) /* remote but checksummed? */
 		tx->flags |= NETTXF_data_validated;
+
+	if (skb_shinfo(skb)->gso_size) {
+		struct netif_tx_extra *gso +			(struct netif_tx_extra
*)RING_GET_REQUEST(&np->tx, ++i);
+
+		if (txtra)
+			txtra->flags |= XEN_NETIF_TXTRA_MORE;
+		else
+			tx->flags |= NETTXF_extra_info;
+
+		gso->gso.size = skb_shinfo(skb)->gso_size;
+		gso->gso.type +			xen_gso_type_linux2xen(skb_shinfo(skb)->gso_type);
+
+		gso->type = XEN_NETIF_TXTRA_TYPE_GSO;
+		gso->flags = 0;
+		txtra = gso;
+	}
 
 	np->tx.req_prod_pvt = i + 1;
 
@@ -1065,9 +1093,47 @@ static int xennet_set_sg(struct net_devi
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int xennet_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		struct netfront_info *np = netdev_priv(dev);
+		int val;
+
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-gso",
+				 "%d", &val) < 0)
+			val = 0;
+		if (!(val & XEN_GSO_TCPV4))
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
+static void xennet_set_gso(struct net_device *dev)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	int allowed_types = SKB_GSO_TCPV4 | SKB_GSO_DODGY;
+	int val;
+
+	if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-gso",
+			 "%d", &val) < 0)
+		return;
+
+	/* Filter out things we don''t understand. */
+	val = xen_gso_type_xen2linux(val) & allowed_types;
+
+	/* Turn on the advertised bits. */
+	dev->features |= val << NETIF_F_GSO_SHIFT;
+}
+
 static void xennet_set_features(struct net_device *dev)
 {
-	xennet_set_sg(dev, 1);
+	/* Turn off all negotiated bits. */
+	dev->features &= (1 << NETIF_F_GSO_SHIFT) - 1;
+	xennet_set_sg(dev, 0);
+
+	if (!xennet_set_sg(dev, 1))
+		xennet_set_gso(dev);
 }
 
 static void network_connect(struct net_device *dev)
@@ -1148,6 +1214,8 @@ static struct ethtool_ops network_ethtoo
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = xennet_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = xennet_set_tso,
 };
 
 #ifdef CONFIG_SYSFS

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-30 13:07 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On 30 Jun 2006, at 13:47, Herbert Xu wrote:
> The net-tso patch has been merged upstream.  I''ve also changed the
> feature-tso interface to be a bit mask of the XEN gso_types bits.
> It''s now called feature-gso.  This means we won''t have to
add one
> feature for each protocol.
>
> So here is a repost of the entire series.
I''ve merged all this already, with a few changes. I''ve also
disabled
netback from advertising the feature, and also netfront from using it, 
until we''ve all agreed that the inter-domain bits are sane. These 
should end up in the public tree in an hour or two.

The changes:
  1. Pushed the gso fields into a struct inside the union. Otherwise the 
fields overlap.
  2. Changed the GSO type definitions. Currently only one type (TCPv4) 
and the protocol type isn''t really a bitmask since they are mutually 
exclusive for a given packet. Also ''dodgy'' makes no sense
since netback
doesn''t trust netfront anyway.
  3. Renamed TXTRA->EXTRA and tx_extra -> extra_info. Looks like you 
want to share the struct with the rx patch at some point, so making it 
tx-specific now makes no sense. If that''s not the case we can rename 
back again.
  4. I''m not sure all the error paths are now correct in netback. For 
example, there''s a call to netbk_tx_err with an end index of 0. Is that
right?

In the latest changes I''d rather have feature-gso list the supported 
protocols as strings (tcpv4,udpv4,etc).

Also, what happens if netfront does the following bad things:
  1. gso.type doesn''t actually match the protocol type?
  2. gso.size is set to a really small value (so that you make lots of 
packets)?
Do we need more handling of these cases in netback? Will these be 
safely handled in the network stack? Might we need to always work out 
gso.type in netback for safety?

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jun-30 13:21 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On Fri, Jun 30, 2006 at 02:07:16PM +0100, Keir Fraser
wrote:>
> I''ve merged all this already, with a few changes. I''ve
also disabled
> netback from advertising the feature, and also netfront from using it, 
> until we''ve all agreed that the inter-domain bits are sane. These 
> should end up in the public tree in an hour or two.
Thanks.
> The changes:
>  1. Pushed the gso fields into a struct inside the union. Otherwise the 
> fields overlap.
Doh! Thanks for the correction.
>  2. Changed the GSO type definitions. Currently only one type (TCPv4) 
> and the protocol type isn''t really a bitmask since they are
mutually
> exclusive for a given packet. Also ''dodgy'' makes no sense
since netback
> doesn''t trust netfront anyway.
Agreed wrt ''dodgy'' bit.

However, this really needs to be a bit mask because we''ll have things
like ECN and other protocol-specific bits in future.  In fact the
upstream Linux tree has an ECN bit now.

The exclusivity will be checked by Linux (I''ve already submitted
patches
to do this).
>  3. Renamed TXTRA->EXTRA and tx_extra -> extra_info. Looks like you 
> want to share the struct with the rx patch at some point, so making it 
> tx-specific now makes no sense. If that''s not the case we can
rename
> back again.
Yes that''s the plan.
>  4. I''m not sure all the error paths are now correct in netback.
For
> example, there''s a call to netbk_tx_err with an end index of 0. Is
that
> right?
That was deliberate as 0 is the smallest RING_IDX.  However, now that
I look at it again there is an off-by-one bug.  I''ll fix that tomorrow.
> In the latest changes I''d rather have feature-gso list the
supported
> protocols as strings (tcpv4,udpv4,etc).
Well then I might as well go back to the one int per-bit thing with
''feature-tso'', ''feature-ufo'', etc. 
It''s much easier than parsing
strings.
> Also, what happens if netfront does the following bad things:
>  1. gso.type doesn''t actually match the protocol type?
This is checked by Linux due to the ''dodgy'' bit.  The code
isn''t in
the net-gso patch yet because for now it only has one protocol.
The upstream code should have it tomorrow as we''re about to add TSO6.
>  2. gso.size is set to a really small value (so that you make lots of 
> packets)?
That''s OK.  TCP must deal with an MSS of 1 anyway.  The same applies to
the other protocols.
> Do we need more handling of these cases in netback? Will these be 
> safely handled in the network stack? Might we need to always work out 
> gso.type in netback for safety?
I think we''ve got all the bits covered now.  But if you can think of
anything else please let me know.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-30 14:36 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On 30 Jun 2006, at 14:21, Herbert Xu wrote:
> However, this really needs to be a bit mask because we''ll have
things
> like ECN and other protocol-specific bits in future.  In fact the
> upstream Linux tree has an ECN bit now.
>
> The exclusivity will be checked by Linux (I''ve already submitted 
> patches
> to do this).
I''m not at all bothered about having the type format the same as in 
Linux. How about we split the type into protocol (being a proper 
enumeration) and proto_flags (being a protocol-specific bitmask)? Might 
there be any non-proto-specific flags in future?
>> In the latest changes I''d rather have feature-gso list the
supported
>> protocols as strings (tcpv4,udpv4,etc).
>
> Well then I might as well go back to the one int per-bit thing with
> ''feature-tso'', ''feature-ufo'', etc. 
It''s much easier than parsing
> strings.
I''m not too bothered either way, but I personally prefer having the 
more properly qualified names listed under feature-gso. Pulling it 
apart with strstr() in netfront (for each proto that netfront can deal 
with) wouldn''t be hard.
>> Also, what happens if netfront does the following bad things:
>>  1. gso.type doesn''t actually match the protocol type?
>
> This is checked by Linux due to the ''dodgy'' bit.  The
code isn''t in
> the net-gso patch yet because for now it only has one protocol.
> The upstream code should have it tomorrow as we''re about to add
TSO6.
Do we then need the ''type'' at all? What is it actually used
for -- I''d
assume the network stack would demux to the correct protocol code as it 
would for any ordinary packet, so why does it need help with the 
protocol for GSO packets?

  Thanks!
  Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-01 03:26 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On Fri, Jun 30, 2006 at 03:36:49PM +0100, Keir Fraser
wrote:>
> I''m not too bothered either way, but I personally prefer having
the
> more properly qualified names listed under feature-gso. Pulling it 
> apart with strstr() in netfront (for each proto that netfront can deal 
> with) wouldn''t be hard.
I can call it feature-gso-tcpv4, feature-gso-udpv4, etc.
> Do we then need the ''type'' at all? What is it actually
used for -- I''d
> assume the network stack would demux to the correct protocol code as it 
> would for any ordinary packet, so why does it need help with the 
> protocol for GSO packets?
Good point.  I''ll get rid of it.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-01 03:33 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On Sat, Jul 01, 2006 at 01:26:09PM +1000, herbert wrote:> 
> > Do we then need the ''type'' at all? What is it
actually used for -- I''d
> > assume the network stack would demux to the correct protocol code as
it
> > would for any ordinary packet, so why does it need help with the 
> > protocol for GSO packets?
> 
> Good point.  I''ll get rid of it.
Actually, we do need it for two reasons:

1. To indicate protocol for drivers that can cope with malformed packets.
   The header verification will be skipped for such drivers.
2. To carry extra flags such as ECN that cannot harm the host if set
   incorrectly.

Given that Linux will cope with malformed headers or a bogus gso_type,
I''d
really like to keep the type value uniform between Linux and Xen.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-01 08:17 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On 1 Jul 2006, at 04:26, Herbert Xu wrote:
>> I''m not too bothered either way, but I personally prefer
having the
>> more properly qualified names listed under feature-gso. Pulling it
>> apart with strstr() in netfront (for each proto that netfront can deal
>> with) wouldn''t be hard.
>
> I can call it feature-gso-tcpv4, feature-gso-udpv4, etc.
Yes, I like that.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-01 08:24 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On 1 Jul 2006, at 04:33, Herbert Xu wrote:
>> Good point.  I''ll get rid of it.
>
> Actually, we do need it for two reasons:
>
> 1. To indicate protocol for drivers that can cope with malformed 
> packets.
>    The header verification will be skipped for such drivers.
> 2. To carry extra flags such as ECN that cannot harm the host if set
>    incorrectly.
Fair enough, that makes sense.
> Given that Linux will cope with malformed headers or a bogus gso_type, 
> I''d
> really like to keep the type value uniform between Linux and Xen.
I''m uncomfortable with this, even though it makes things a little 
easier now. For sanity I want to see netfront/netback explicitly grok 
flags rather than dumbly pass them through. I''d prefer uint8_t protocol
and uint8_t flags. Former is a protocol enumeration; latter is unused 
now but we can add ECN and so on later. By the way: will we need 
netback to advertise support for the ECN flag? I''m not sure exactly 
what it will mean, and whether it can just be ignored by netbacks that 
don''t support it?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-01 09:59 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On Sat, Jul 01, 2006 at 09:24:13AM +0100, Keir Fraser
wrote:> 
> I''m uncomfortable with this, even though it makes things a little 
> easier now. For sanity I want to see netfront/netback explicitly grok 
> flags rather than dumbly pass them through. I''d prefer uint8_t
protocol
> and uint8_t flags. Former is a protocol enumeration; latter is unused 
> now but we can add ECN and so on later. By the way: will we need 
OK.
> netback to advertise support for the ECN flag? I''m not sure
exactly
> what it will mean, and whether it can just be ignored by netbacks that 
> don''t support it?
If netback does not advertise the flag then netfront will perform
segmentation before passing TCP packets with CWR set through.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-01 12:17 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On 1 Jul 2006, at 10:59, Herbert Xu wrote:
>> netback to advertise support for the ECN flag? I''m not sure
exactly
>> what it will mean, and whether it can just be ignored by netbacks that
>> don''t support it?
>
> If netback does not advertise the flag then netfront will perform
> segmentation before passing TCP packets with CWR set through.
Ah, okay, can that not be handled properly by some TSO-supporting NICs 
then? You need to do something smarter than simply duplicate the ECN 
bits in every segment header? What would happen if netfront sent 
through a packet with ECE/CWR and didn''t set the GSO_ECN flag -- is it 
safe but stupid, will it break something, should dodgy packets have the 
GSO_ECN flag recomputed?

  Thanks!
  Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-01 12:38 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

On Sat, Jul 01, 2006 at 01:17:31PM +0100, Keir Fraser
wrote:> 
> Ah, okay, can that not be handled properly by some TSO-supporting NICs 
> then? You need to do something smarter than simply duplicate the ECN 
> bits in every segment header? What would happen if netfront sent 
> through a packet with ECE/CWR and didn''t set the GSO_ECN flag --
is it
> safe but stupid, will it break something, should dodgy packets have the 
> GSO_ECN flag recomputed?
Getting the ECN bit wrong is OK.  The worst result is that the packets
generated will have incorrect CWR markings which will only penalise the
connection belonging to the guest.

If the guest wanted to achieve that, it can always do the same thing by
segmenting the packets itself anyway.

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-03 04:44 UTC

head link

[Xen-devel] [1/4] [NET] back: Fix off-by-one error in netbk_tx_err

Hi Keir:

Here are the GSO changes again which should address your concerns.  Let me
know if you have any other problems.

[NET] back: Fix off-by-one error in netbk_tx_err

The generalised extra request info patch introduced a bug with the use
of netbk_tx_err since it advanced the req_cons pointer by one.  This
patch fixes thing by delaying the increment in netbk_tx_err.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 3fe11185adfb -r 3656a2985ae1
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Sat Jul 01 09:37:24
2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 03 14:18:54
2006 +1000
@@ -496,9 +496,9 @@ static void netbk_tx_err(netif_t *netif,
 
 	do {
 		make_tx_response(netif, txp, NETIF_RSP_ERROR);
-		if (++cons >= end)
+		if (cons >= end)
 			break;
-		txp = RING_GET_REQUEST(&netif->tx, cons);
+		txp = RING_GET_REQUEST(&netif->tx, cons++);
 	} while (1);
 	netif->tx.req_cons = cons;
 	netif_schedule_work(netif);
@@ -764,11 +764,11 @@ static void net_tx_action(unsigned long 
 		if (txreq.flags & NETTXF_extra_info) {
 			work_to_do = netbk_get_extras(netif, extras,
 						      work_to_do);
+			i = netif->tx.req_cons;
 			if (unlikely(work_to_do < 0)) {
-				netbk_tx_err(netif, &txreq, 0);
+				netbk_tx_err(netif, &txreq, i);
 				continue;
 			}
-			i = netif->tx.req_cons;
 		}
 
 		ret = netbk_count_requests(netif, &txreq, work_to_do);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-03 04:45 UTC

head link

[Xen-devel] [2/4] [NET] back: Add GSO features field and check gso_size

Hi:

[NET] back: Add GSO features field and check gso_size

This patch adds the as-yet unused GSO features which will contain
protocol-independent bits such as the ECN marker.

It also makes the backend check gso_size to ensure that it is non-zero.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 3656a2985ae1 -r 8c37d0d4526e
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 03 14:18:54
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Mon Jul 03 14:31:15
2006 +1000
@@ -691,6 +691,29 @@ int netbk_get_extras(netif_t *netif, str
 	return work_to_do;
 }
 
+static int netbk_set_skb_gso(struct sk_buff *skb, struct netif_extra_info *gso)
+{
+	if (!gso->u.gso.size) {
+		DPRINTK("GSO size must not be zero.\n");
+		return -EINVAL;
+	}
+
+	/* Currently on TCPv4 S.O. is supported. */
+	if (gso->u.gso.type != XEN_NETIF_GSO_TCPV4) {
+		DPRINTK("Bad GSO type %d.\n", gso->u.gso.type);
+		return -EINVAL;
+	}
+
+	skb_shinfo(skb)->gso_size = gso->u.gso.size;
+	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+
+	/* Header must be checked, and gso_segs computed. */
+	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
+	skb_shinfo(skb)->gso_segs = 0;
+
+	return 0;
+}
+
 /* Called after netfront has transmitted */
 static void net_tx_action(unsigned long unused)
 {
@@ -819,20 +842,11 @@ static void net_tx_action(unsigned long 
 			struct netif_extra_info *gso;
 			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
 
-			/* Currently on TCPv4 S.O. is supported. */
-			if (gso->u.gso.type != XEN_NETIF_GSO_TCPV4) {
-				DPRINTK("Bad GSO type %d.\n", gso->u.gso.type);
+			if (netbk_set_skb_gso(skb, gso)) {
 				kfree_skb(skb);
 				netbk_tx_err(netif, &txreq, i);
-				break;
+				continue;
 			}
-
-			skb_shinfo(skb)->gso_size = gso->u.gso.size;
-			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
-
-			/* Header must be checked, and gso_segs computed. */
-			skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
-			skb_shinfo(skb)->gso_segs = 0;
 		}
 
 		gnttab_set_map_op(mop, MMAP_VADDR(pending_idx),
diff -r 3656a2985ae1 -r 8c37d0d4526e xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Mon Jul 03 14:18:54 2006 +1000
+++ b/xen/include/public/io/netif.h	Mon Jul 03 14:31:15 2006 +1000
@@ -88,6 +88,12 @@ struct netif_extra_info {
              * extra features required to segment the packet properly.
              */
             uint16_t type; /* XEN_NETIF_GSO_* */
+
+            /*
+             * GSO features . This specifies any extra GSO features required
+             * to process this packet, such as ECN support for TCPv4.
+             */
+            uint16_t features; /* XEN_NETIF_FEAT_* */
         } gso;
 
         uint16_t pad[3];

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-03 04:46 UTC

head link

[Xen-devel] [3/4] [NET]: Rename feature-tso to feature-gso-tcpv4

Hi:

[NET]: Rename feature-tso to feature-gso-tcpv4

This patch renames the name feature-tso to feature-gso-tcpv4 for future
expansion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 8c37d0d4526e -r 8c5fd9867b3c
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Mon Jul 03 14:31:15 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Mon Jul 03 14:35:47 2006
+1000
@@ -102,9 +102,10 @@ static int netback_probe(struct xenbus_d
 		}
 
 #if 0 /* KAF: After the protocol is finalised. */
-		err = xenbus_printf(xbt, dev->nodename, "feature-tso",
"%d", 1);
+		err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4",
+				    "%d", 1);
 		if (err) {
-			message = "writing feature-tso";
+			message = "writing feature-gso-tcpv4";
 			goto abort_transaction;
 		}
 #endif
diff -r 8c37d0d4526e -r 8c5fd9867b3c
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 03 14:31:15
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 03 14:35:47
2006 +1000
@@ -1098,8 +1098,8 @@ static int xennet_set_tso(struct net_dev
 		struct netfront_info *np = netdev_priv(dev);
 		int val;
 
-		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso",
-				 "%d", &val) < 0)
+		if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
+				 "feature-gso-tcpv4", "%d", &val) < 0)
 			val = 0;
 #if 0 /* KAF: After the protocol is finalised. */
 		if (!val)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-03 04:46 UTC

head link

[Xen-devel] [4/4] [NET] front: Zero negotiated bits in xen_set_features

Hi:

[NET] front: Zero negotiated bits in xen_set_features

When we reconnect to the backend we need to first zero all negotiated
bits as the functions xen_set_sg and xen_set_tso do not (and are not
supposed to) zero bits when they fail to set them.

This patch also permanently enables the NETIF_F_GSO_ROBUST bit as we
never parse any GSO fields ourselves (even if we did the backend could
not trust us so it''s wasted effort).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 8c5fd9867b3c -r 531033849420
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 03 14:35:47
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Mon Jul 03 14:42:09
2006 +1000
@@ -1112,6 +1112,11 @@ static int xennet_set_tso(struct net_dev
 
 static void xennet_set_features(struct net_device *dev)
 {
+	/* Turn off all GSO bits except ROBUST. */
+	dev->features &= (1 << NETIF_F_GSO_SHIFT) - 1;
+	dev->features |= NETIF_F_GSO_ROBUST;
+	xennet_set_sg(dev, 0);
+
 	if (!xennet_set_sg(dev, 1))
 		xennet_set_tso(dev, 1);
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-03 08:12 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix off-by-one error in netbk_tx_err

On 3 Jul 2006, at 05:44, Herbert Xu wrote:
> Here are the GSO changes again which should address your concerns.  
> Let me
> know if you have any other problems.
I wonder if the ''type'' field would be better named
''protocol'' now?
''Type'' is nicely vague though. Also I changed it to uint8_t
since it''s
an enumeration -- should be plenty big enough and leaves us with 8 bits 
spare in case that''s useful in future. Does that seem okay?

Apart from those questions I checked all your patches in. Once they 
pass our regression tests, and you give the okay, I''ll enable the 
feature negotiation.

  Thanks,
  Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-03 08:14 UTC

head link

[Xen-devel] Re: [1/4] [NET] back: Fix off-by-one error in netbk_tx_err

On Mon, Jul 03, 2006 at 09:12:32AM +0100, Keir Fraser
wrote:> 
> I wonder if the ''type'' field would be better named
''protocol'' now?
> ''Type'' is nicely vague though. Also I changed it to
uint8_t since it''s
> an enumeration -- should be plenty big enough and leaves us with 8 bits 
> spare in case that''s useful in future. Does that seem okay?
Sure that''s fine by me.
> Apart from those questions I checked all your patches in. Once they 
> pass our regression tests, and you give the okay, I''ll enable the 
> feature negotiation.
Great!

Cheers,
-- 
Visit Openswan at openswan.org
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: gondor.apana.org.au/~herbert
PGP Key: gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
lists.xensource.com/xen-devel

Xen devel - Jun 2006 - [0/5] [NET]: Add TSO support

[Xen-devel] [0/5] [NET]: Add TSO support

[Xen-devel] [1/5] [NET]: Added GSO support

[Xen-devel] [2/5] [NET]: Added GSO header verification

[Xen-devel] [3/5] [NET] loopback: Added support for TSO

[Xen-devel] [4/5] [NET] back: Add TSO support

[Xen-devel] [5/5] [NET] front: Transmit TSO packets if supported

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

[Xen-devel] [1/5] [NET]: Added GSO support

[Xen-devel] [2/5] [NET]: Give make_tx_response the req pointer instead of id

[Xen-devel] [3/5] [NET] loopback: Added support for TSO

[Xen-devel] [4/5] [NET] back: Add TSO support

[Xen-devel] [5/5] [NET] front: Transmit TSO packets if supported

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

[Xen-devel] [1/2] [NET] back: Make extra info slot more generic

[Xen-devel] [2/2] [NET] front: Transmit TSO packets if supported

[Xen-devel] Re: [1/2] [NET] back: Make extra info slot more generic

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

Re: [Xen-devel] [0/5] [NET]: Add TSO support

[Xen-devel] [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] [2/4] [NET]: Add net-tso.patch

[Xen-devel] [3/4] [NET] back: Make extra info slot more generic

[Xen-devel] [4/4] [NET] front: Transmit TSO packets if supported

[Xen-devel] [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] [2/4] [NET]: Update net-gso.patch

[Xen-devel] [3/4] [NET] back: Make extra info slot more generic

[Xen-devel] [4/4] [NET] front: Transmit TSO packets if supported

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check

[Xen-devel] [1/4] [NET] back: Fix off-by-one error in netbk_tx_err

[Xen-devel] [2/4] [NET] back: Add GSO features field and check gso_size

[Xen-devel] [3/4] [NET]: Rename feature-tso to feature-gso-tcpv4

[Xen-devel] [4/4] [NET] front: Zero negotiated bits in xen_set_features

[Xen-devel] Re: [1/4] [NET] back: Fix off-by-one error in netbk_tx_err

[Xen-devel] Re: [1/4] [NET] back: Fix off-by-one error in netbk_tx_err