Hi: The following patches add TCP Segmentation Offload (TSO) support in the domU => dom0 direction. If everyone''s happy with this approach then it''s trivial to do the same thing for the opposite direction. TSO support requires the Geneic Segmentation Offload (GSO) infrastructure that was recently added to Linux (merged after the release of 2.6.17 so will be part of 2.6.18). This is needed when a TSO packet is sent to a device without TSO support. I''ve included a backport of GSO for 2.6.16.13 here. Testing is should be easy as TSO is turned on by default. You can verify that it''s working by attaching tcpdump to any of the interfaces involved ina domU => dom0 transfer. You should see packets whose payload is an integral multiple of MSS. Comparison with base line and jumbo MTU: baseline: 1228.08Mb/s mtu=16436: 3097.49Mb/s TSO: 3208.41Mb/s mtu=60040: 3869.36Mb/s lo(16436): 5543.91Mb/s lo(60040): 8544.08Mb/s There is still a problem with TSO ack timing that causes it to generate smaller than optimal packets. This can be seen with increasing the RCV buffer size on the transfer which actually causes TSO performance to dip by 30% to about 2000Mb/s. The reason is that we get a lot more smaller packets (2*MSS or 3*MSS) than really big ones (~64K). I''m investigating this issue. However, even with this nit, it''s still a great improvement over the base line and does not have the interoperability issue that comes with raising MTU. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET]: Added GSO support Imported GSO patch. This is a 2.6.16 backport for the GSO patch that was merged in Linux. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 9ec0b4f10b4f -r b255efd0df72 linux-2.6-xen-sparse/include/linux/skbuff.h --- a/linux-2.6-xen-sparse/include/linux/skbuff.h Sat Jun 24 23:44:18 2006 +0100 +++ b/linux-2.6-xen-sparse/include/linux/skbuff.h Tue Jun 27 21:27:13 2006 +1000 @@ -134,9 +134,10 @@ struct skb_shared_info { struct skb_shared_info { atomic_t dataref; unsigned short nr_frags; - unsigned short tso_size; - unsigned short tso_segs; - unsigned short ufo_size; + unsigned short gso_size; + /* Warning: this field is not always filled in (UFO)! */ + unsigned short gso_segs; + unsigned short gso_type; unsigned int ip6_frag_id; struct sk_buff *frag_list; skb_frag_t frags[MAX_SKB_FRAGS]; @@ -166,6 +167,11 @@ enum { SKB_FCLONE_UNAVAILABLE, SKB_FCLONE_ORIG, SKB_FCLONE_CLONE, +}; + +enum { + SKB_GSO_TCPV4 = 1 << 0, + SKB_GSO_UDPV4 = 1 << 1, }; /** @@ -1157,18 +1163,34 @@ static inline int skb_can_coalesce(struc return 0; } +static inline int __skb_linearize(struct sk_buff *skb) +{ + return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM; +} + /** * skb_linearize - convert paged skb to linear one * @skb: buffer to linarize - * @gfp: allocation mode * * If there is no free memory -ENOMEM is returned, otherwise zero * is returned and the old skb data released. */ -extern int __skb_linearize(struct sk_buff *skb, gfp_t gfp); -static inline int skb_linearize(struct sk_buff *skb, gfp_t gfp) -{ - return __skb_linearize(skb, gfp); +static inline int skb_linearize(struct sk_buff *skb) +{ + return skb_is_nonlinear(skb) ? __skb_linearize(skb) : 0; +} + +/** + * skb_linearize_cow - make sure skb is linear and writable + * @skb: buffer to process + * + * If there is no free memory -ENOMEM is returned, otherwise zero + * is returned and the old skb data released. + */ +static inline int skb_linearize_cow(struct sk_buff *skb) +{ + return skb_is_nonlinear(skb) || skb_cloned(skb) ? + __skb_linearize(skb) : 0; } /** @@ -1263,6 +1285,7 @@ extern void skb_split(struct sk_b struct sk_buff *skb1, const u32 len); extern void skb_release_data(struct sk_buff *skb); +extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg); static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) diff -r 9ec0b4f10b4f -r b255efd0df72 linux-2.6-xen-sparse/net/core/dev.c --- a/linux-2.6-xen-sparse/net/core/dev.c Sat Jun 24 23:44:18 2006 +0100 +++ b/linux-2.6-xen-sparse/net/core/dev.c Tue Jun 27 21:27:13 2006 +1000 @@ -115,6 +115,7 @@ #include <net/iw_handler.h> #endif /* CONFIG_NET_RADIO */ #include <asm/current.h> +#include <linux/err.h> #ifdef CONFIG_XEN #include <net/ip.h> @@ -1038,7 +1039,7 @@ static inline void net_timestamp(struct * taps currently in use. */ -void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) +static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) { struct packet_type *ptype; @@ -1112,6 +1113,40 @@ out: return ret; } +/** + * skb_gso_segment - Perform segmentation on skb. + * @skb: buffer to segment + * @sg: whether scatter-gather is supported on the target. + * + * This function segments the given skb and returns a list of segments. + */ +struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg) +{ + struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); + struct packet_type *ptype; + int type = skb->protocol; + + BUG_ON(skb_shinfo(skb)->frag_list); + BUG_ON(skb->ip_summed != CHECKSUM_HW); + + skb->mac.raw = skb->data; + skb->mac_len = skb->nh.raw - skb->data; + __skb_pull(skb, skb->mac_len); + + rcu_read_lock(); + list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) { + if (ptype->type == type && !ptype->dev && ptype->gso_segment) { + segs = ptype->gso_segment(skb, sg); + break; + } + } + rcu_read_unlock(); + + return segs; +} + +EXPORT_SYMBOL(skb_gso_segment); + /* Take action when hardware reception checksum errors are detected. */ #ifdef CONFIG_BUG void netdev_rx_csum_fault(struct net_device *dev) @@ -1148,75 +1183,98 @@ static inline int illegal_highdma(struct #define illegal_highdma(dev, skb) (0) #endif -/* Keep head the same: replace data */ -int __skb_linearize(struct sk_buff *skb, gfp_t gfp_mask) -{ - unsigned int size; - u8 *data; - long offset; - struct skb_shared_info *ninfo; - int headerlen = skb->data - skb->head; - int expand = (skb->tail + skb->data_len) - skb->end; - - if (skb_shared(skb)) - BUG(); - - if (expand <= 0) - expand = 0; - - size = skb->end - skb->head + expand; - size = SKB_DATA_ALIGN(size); - data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); - if (!data) - return -ENOMEM; - - /* Copy entire thing */ - if (skb_copy_bits(skb, -headerlen, data, headerlen + skb->len)) - BUG(); - - /* Set up shinfo */ - ninfo = (struct skb_shared_info*)(data + size); - atomic_set(&ninfo->dataref, 1); - ninfo->tso_size = skb_shinfo(skb)->tso_size; - ninfo->tso_segs = skb_shinfo(skb)->tso_segs; - ninfo->nr_frags = 0; - ninfo->frag_list = NULL; - - /* Offset between the two in bytes */ - offset = data - skb->head; - - /* Free old data. */ - skb_release_data(skb); - - skb->head = data; - skb->end = data + size; - - /* Set up new pointers */ - skb->h.raw += offset; - skb->nh.raw += offset; - skb->mac.raw += offset; - skb->tail += offset; - skb->data += offset; - - /* We are no longer a clone, even if we were. */ - skb->cloned = 0; - - skb->tail += skb->data_len; - skb->data_len = 0; +struct dev_gso_cb { + void (*destructor)(struct sk_buff *skb); +}; + +#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb) + +static void dev_gso_skb_destructor(struct sk_buff *skb) +{ + struct dev_gso_cb *cb; + + do { + struct sk_buff *nskb = skb->next; + + skb->next = nskb->next; + nskb->next = NULL; + kfree_skb(nskb); + } while (skb->next); + + cb = DEV_GSO_CB(skb); + if (cb->destructor) + cb->destructor(skb); +} + +/** + * dev_gso_segment - Perform emulated hardware segmentation on skb. + * @skb: buffer to segment + * + * This function segments the given skb and stores the list of segments + * in skb->next. + */ +static int dev_gso_segment(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct sk_buff *segs; + + segs = skb_gso_segment(skb, dev->features & NETIF_F_SG && + !illegal_highdma(dev, skb)); + if (unlikely(IS_ERR(segs))) + return PTR_ERR(segs); + + skb->next = segs; + DEV_GSO_CB(skb)->destructor = skb->destructor; + skb->destructor = dev_gso_skb_destructor; + + return 0; +} + +int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + if (likely(!skb->next)) { + if (netdev_nit) + dev_queue_xmit_nit(skb, dev); + + if (!netif_needs_gso(dev, skb)) + return dev->hard_start_xmit(skb, dev); + + if (unlikely(dev_gso_segment(skb))) + goto out_kfree_skb; + } + + do { + struct sk_buff *nskb = skb->next; + int rc; + + skb->next = nskb->next; + nskb->next = NULL; + rc = dev->hard_start_xmit(nskb, dev); + if (unlikely(rc)) { + nskb->next = skb->next; + skb->next = nskb; + return rc; + } + if (unlikely(netif_queue_stopped(dev) && skb->next)) + return NETDEV_TX_BUSY; + } while (skb->next); + + skb->destructor = DEV_GSO_CB(skb)->destructor; + +out_kfree_skb: + kfree_skb(skb); return 0; } #define HARD_TX_LOCK(dev, cpu) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ - spin_lock(&dev->xmit_lock); \ - dev->xmit_lock_owner = cpu; \ + netif_tx_lock(dev); \ } \ } #define HARD_TX_UNLOCK(dev) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ - dev->xmit_lock_owner = -1; \ - spin_unlock(&dev->xmit_lock); \ + netif_tx_unlock(dev); \ } \ } @@ -1289,9 +1347,19 @@ int dev_queue_xmit(struct sk_buff *skb) struct Qdisc *q; int rc = -ENOMEM; + /* If a checksum-deferred packet is forwarded to a device that needs a + * checksum, correct the pointers and force checksumming. + */ + if (skb_checksum_setup(skb)) + goto out_kfree_skb; + + /* GSO will handle the following emulations directly. */ + if (netif_needs_gso(dev, skb)) + goto gso; + if (skb_shinfo(skb)->frag_list && !(dev->features & NETIF_F_FRAGLIST) && - __skb_linearize(skb, GFP_ATOMIC)) + __skb_linearize(skb)) goto out_kfree_skb; /* Fragmented skb is linearized if device does not support SG, @@ -1300,31 +1368,27 @@ int dev_queue_xmit(struct sk_buff *skb) */ if (skb_shinfo(skb)->nr_frags && (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && + __skb_linearize(skb)) __skb_linearize(skb, GFP_ATOMIC)) goto out_kfree_skb; - /* If a checksum-deferred packet is forwarded to a device that needs a - * checksum, correct the pointers and force checksumming. - */ - if(skb_checksum_setup(skb)) - goto out_kfree_skb; - /* If packet is not checksummed and device does not support * checksumming for this protocol, complete checksumming here. */ if (skb->ip_summed == CHECKSUM_HW && - (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) && + (!(dev->features & NETIF_F_GEN_CSUM) && (!(dev->features & NETIF_F_IP_CSUM) || skb->protocol != htons(ETH_P_IP)))) if (skb_checksum_help(skb, 0)) goto out_kfree_skb; +gso: spin_lock_prefetch(&dev->queue_lock); /* Disable soft irqs for various locks below. Also * stops preemption for RCU. */ - local_bh_disable(); + rcu_read_lock_bh(); /* Updates of qdisc are serialized by queue_lock. * The struct Qdisc which is pointed to by qdisc is now a @@ -1358,8 +1422,8 @@ int dev_queue_xmit(struct sk_buff *skb) /* The device has no queue. Common case for software devices: loopback, all the sorts of tunnels... - Really, it is unlikely that xmit_lock protection is necessary here. - (f.e. loopback and IP tunnels are clean ignoring statistics + Really, it is unlikely that netif_tx_lock protection is necessary + here. (f.e. loopback and IP tunnels are clean ignoring statistics counters.) However, it is possible, that they rely on protection made by us here. @@ -1375,11 +1439,8 @@ int dev_queue_xmit(struct sk_buff *skb) HARD_TX_LOCK(dev, cpu); if (!netif_queue_stopped(dev)) { - if (netdev_nit) - dev_queue_xmit_nit(skb, dev); - rc = 0; - if (!dev->hard_start_xmit(skb, dev)) { + if (!dev_hard_start_xmit(skb, dev)) { HARD_TX_UNLOCK(dev); goto out; } @@ -1398,13 +1459,13 @@ int dev_queue_xmit(struct sk_buff *skb) } rc = -ENETDOWN; - local_bh_enable(); + rcu_read_unlock_bh(); out_kfree_skb: kfree_skb(skb); return rc; out: - local_bh_enable(); + rcu_read_unlock_bh(); return rc; } @@ -2732,7 +2793,7 @@ int register_netdevice(struct net_device BUG_ON(dev->reg_state != NETREG_UNINITIALIZED); spin_lock_init(&dev->queue_lock); - spin_lock_init(&dev->xmit_lock); + spin_lock_init(&dev->_xmit_lock); dev->xmit_lock_owner = -1; #ifdef CONFIG_NET_CLS_ACT spin_lock_init(&dev->ingress_lock); @@ -2776,9 +2837,7 @@ int register_netdevice(struct net_device /* Fix illegal SG+CSUM combinations. */ if ((dev->features & NETIF_F_SG) && - !(dev->features & (NETIF_F_IP_CSUM | - NETIF_F_NO_CSUM | - NETIF_F_HW_CSUM))) { + !(dev->features & NETIF_F_ALL_CSUM)) { printk("%s: Dropping NETIF_F_SG since no checksum feature.\n", dev->name); dev->features &= ~NETIF_F_SG; @@ -3330,7 +3389,6 @@ EXPORT_SYMBOL(__dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_name); EXPORT_SYMBOL(__dev_remove_pack); -EXPORT_SYMBOL(__skb_linearize); EXPORT_SYMBOL(dev_valid_name); EXPORT_SYMBOL(dev_add_pack); EXPORT_SYMBOL(dev_alloc_name); diff -r 9ec0b4f10b4f -r b255efd0df72 linux-2.6-xen-sparse/net/core/skbuff.c --- a/linux-2.6-xen-sparse/net/core/skbuff.c Sat Jun 24 23:44:18 2006 +0100 +++ b/linux-2.6-xen-sparse/net/core/skbuff.c Tue Jun 27 21:27:13 2006 +1000 @@ -165,9 +165,9 @@ struct sk_buff *__alloc_skb(unsigned int shinfo = skb_shinfo(skb); atomic_set(&shinfo->dataref, 1); shinfo->nr_frags = 0; - shinfo->tso_size = 0; - shinfo->tso_segs = 0; - shinfo->ufo_size = 0; + shinfo->gso_size = 0; + shinfo->gso_segs = 0; + shinfo->gso_type = 0; shinfo->ip6_frag_id = 0; shinfo->frag_list = NULL; @@ -237,9 +237,9 @@ struct sk_buff *alloc_skb_from_cache(kme shinfo = skb_shinfo(skb); atomic_set(&shinfo->dataref, 1); shinfo->nr_frags = 0; - shinfo->tso_size = 0; - shinfo->tso_segs = 0; - shinfo->ufo_size = 0; + skb_shinfo(skb)->gso_size = 0; + skb_shinfo(skb)->gso_segs = 0; + skb_shinfo(skb)->gso_type = 0; shinfo->ip6_frag_id = 0; shinfo->frag_list = NULL; @@ -524,8 +524,9 @@ static void copy_skb_header(struct sk_bu new->tc_index = old->tc_index; #endif atomic_set(&new->users, 1); - skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size; - skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs; + skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size; + skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs; + skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type; } /** @@ -1799,6 +1800,132 @@ int skb_append_datato_frags(struct sock return 0; } + +/** + * skb_segment - Perform protocol segmentation on skb. + * @skb: buffer to segment + * @sg: whether scatter-gather can be used for generated segments + * + * This function performs segmentation on the given skb. It returns + * the segment at the given position. It returns NULL if there are + * no more segments to generate, or when an error is encountered. + */ +struct sk_buff *skb_segment(struct sk_buff *skb, int sg) +{ + struct sk_buff *segs = NULL; + struct sk_buff *tail = NULL; + unsigned int mss = skb_shinfo(skb)->gso_size; + unsigned int doffset = skb->data - skb->mac.raw; + unsigned int offset = doffset; + unsigned int headroom; + unsigned int len; + int nfrags = skb_shinfo(skb)->nr_frags; + int err = -ENOMEM; + int i = 0; + int pos; + + __skb_push(skb, doffset); + headroom = skb_headroom(skb); + pos = skb_headlen(skb); + + do { + struct sk_buff *nskb; + skb_frag_t *frag; + int hsize, nsize; + int k; + int size; + + len = skb->len - offset; + if (len > mss) + len = mss; + + hsize = skb_headlen(skb) - offset; + if (hsize < 0) + hsize = 0; + nsize = hsize + doffset; + if (nsize > len + doffset || !sg) + nsize = len + doffset; + + nskb = alloc_skb(nsize + headroom, GFP_ATOMIC); + if (unlikely(!nskb)) + goto err; + + if (segs) + tail->next = nskb; + else + segs = nskb; + tail = nskb; + + nskb->dev = skb->dev; + nskb->priority = skb->priority; + nskb->protocol = skb->protocol; + nskb->dst = dst_clone(skb->dst); + memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); + nskb->pkt_type = skb->pkt_type; + nskb->mac_len = skb->mac_len; + + skb_reserve(nskb, headroom); + nskb->mac.raw = nskb->data; + nskb->nh.raw = nskb->data + skb->mac_len; + nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw); + memcpy(skb_put(nskb, doffset), skb->data, doffset); + + if (!sg) { + nskb->csum = skb_copy_and_csum_bits(skb, offset, + skb_put(nskb, len), + len, 0); + continue; + } + + frag = skb_shinfo(nskb)->frags; + k = 0; + + nskb->ip_summed = CHECKSUM_HW; + nskb->csum = skb->csum; + memcpy(skb_put(nskb, hsize), skb->data + offset, hsize); + + while (pos < offset + len) { + BUG_ON(i >= nfrags); + + *frag = skb_shinfo(skb)->frags[i]; + get_page(frag->page); + size = frag->size; + + if (pos < offset) { + frag->page_offset += offset - pos; + frag->size -= offset - pos; + } + + k++; + + if (pos + size <= offset + len) { + i++; + pos += size; + } else { + frag->size -= pos + size - (offset + len); + break; + } + + frag++; + } + + skb_shinfo(nskb)->nr_frags = k; + nskb->data_len = len - hsize; + nskb->len += nskb->data_len; + nskb->truesize += nskb->data_len; + } while ((offset += len) < skb->len); + + return segs; + +err: + while ((skb = segs)) { + segs = skb->next; + kfree(skb); + } + return ERR_PTR(err); +} + +EXPORT_SYMBOL_GPL(skb_segment); void __init skb_init(void) { diff -r 9ec0b4f10b4f -r b255efd0df72 patches/linux-2.6.16.13/net-gso.patch --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/patches/linux-2.6.16.13/net-gso.patch Tue Jun 27 21:27:13 2006 +1000 @@ -0,0 +1,2285 @@ +diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt +index 3c0a5ba..847cedb 100644 +--- a/Documentation/networking/netdevices.txt ++++ b/Documentation/networking/netdevices.txt +@@ -42,9 +42,9 @@ dev->get_stats: + Context: nominally process, but don''t sleep inside an rwlock + + dev->hard_start_xmit: +- Synchronization: dev->xmit_lock spinlock. ++ Synchronization: netif_tx_lock spinlock. + When the driver sets NETIF_F_LLTX in dev->features this will be +- called without holding xmit_lock. In this case the driver ++ called without holding netif_tx_lock. In this case the driver + has to lock by itself when needed. It is recommended to use a try lock + for this and return -1 when the spin lock fails. + The locking there should also properly protect against +@@ -62,12 +62,12 @@ dev->hard_start_xmit: + Only valid when NETIF_F_LLTX is set. + + dev->tx_timeout: +- Synchronization: dev->xmit_lock spinlock. ++ Synchronization: netif_tx_lock spinlock. + Context: BHs disabled + Notes: netif_queue_stopped() is guaranteed true + + dev->set_multicast_list: +- Synchronization: dev->xmit_lock spinlock. ++ Synchronization: netif_tx_lock spinlock. + Context: BHs disabled + + dev->poll: +diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c +index 4be9769..2e7cac7 100644 +--- a/drivers/block/aoe/aoenet.c ++++ b/drivers/block/aoe/aoenet.c +@@ -95,9 +95,8 @@ mac_addr(char addr[6]) + static struct sk_buff * + skb_check(struct sk_buff *skb) + { +- if (skb_is_nonlinear(skb)) + if ((skb = skb_share_check(skb, GFP_ATOMIC))) +- if (skb_linearize(skb, GFP_ATOMIC) < 0) { ++ if (skb_linearize(skb)) { + dev_kfree_skb(skb); + return NULL; + } +diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +index a2408d7..c90e620 100644 +--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c ++++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +@@ -821,7 +821,8 @@ void ipoib_mcast_restart_task(void *dev_ + + ipoib_mcast_stop_thread(dev, 0); + +- spin_lock_irqsave(&dev->xmit_lock, flags); ++ local_irq_save(flags); ++ netif_tx_lock(dev); + spin_lock(&priv->lock); + + /* +@@ -896,7 +897,8 @@ void ipoib_mcast_restart_task(void *dev_ + } + + spin_unlock(&priv->lock); +- spin_unlock_irqrestore(&dev->xmit_lock, flags); ++ netif_tx_unlock(dev); ++ local_irq_restore(flags); + + /* We have to cancel outside of the spinlock */ + list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { +diff --git a/drivers/media/dvb/dvb-core/dvb_net.c b/drivers/media/dvb/dvb-core/dvb_net.c +index 6711eb6..8d2351f 100644 +--- a/drivers/media/dvb/dvb-core/dvb_net.c ++++ b/drivers/media/dvb/dvb-core/dvb_net.c +@@ -1052,7 +1052,7 @@ static void wq_set_multicast_list (void + + dvb_net_feed_stop(dev); + priv->rx_mode = RX_MODE_UNI; +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + + if (dev->flags & IFF_PROMISC) { + dprintk("%s: promiscuous mode\n", dev->name); +@@ -1077,7 +1077,7 @@ static void wq_set_multicast_list (void + } + } + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + dvb_net_feed_start(dev); + } + +diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c +index dd41049..6615583 100644 +--- a/drivers/net/8139cp.c ++++ b/drivers/net/8139cp.c +@@ -794,7 +794,7 @@ #endif + entry = cp->tx_head; + eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0; + if (dev->features & NETIF_F_TSO) +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + + if (skb_shinfo(skb)->nr_frags == 0) { + struct cp_desc *txd = &cp->tx_ring[entry]; +diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c +index a24200d..b5e39a1 100644 +--- a/drivers/net/bnx2.c ++++ b/drivers/net/bnx2.c +@@ -1593,7 +1593,7 @@ bnx2_tx_int(struct bnx2 *bp) + skb = tx_buf->skb; + #ifdef BCM_TSO + /* partial BD completions possible with TSO packets */ +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + u16 last_idx, last_ring_idx; + + last_idx = sw_cons + +@@ -1948,7 +1948,7 @@ bnx2_poll(struct net_device *dev, int *b + return 1; + } + +-/* Called with rtnl_lock from vlan functions and also dev->xmit_lock ++/* Called with rtnl_lock from vlan functions and also netif_tx_lock + * from set_multicast. + */ + static void +@@ -4403,7 +4403,7 @@ bnx2_vlan_rx_kill_vid(struct net_device + } + #endif + +-/* Called with dev->xmit_lock. ++/* Called with netif_tx_lock. + * hard_start_xmit is pseudo-lockless - a lock is only required when + * the tx queue is full. This way, we get the benefit of lockless + * operations most of the time without the complexities to handle +@@ -4441,7 +4441,7 @@ bnx2_start_xmit(struct sk_buff *skb, str + (TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16)); + } + #ifdef BCM_TSO +- if ((mss = skb_shinfo(skb)->tso_size) && ++ if ((mss = skb_shinfo(skb)->gso_size) && + (skb->len > (bp->dev->mtu + ETH_HLEN))) { + u32 tcp_opt_len, ip_tcp_len; + +diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c +index bcf9f17..e970921 100644 +--- a/drivers/net/bonding/bond_main.c ++++ b/drivers/net/bonding/bond_main.c +@@ -1145,8 +1145,7 @@ int bond_sethwaddr(struct net_device *bo + } + + #define BOND_INTERSECT_FEATURES \ +- (NETIF_F_SG|NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM|\ +- NETIF_F_TSO|NETIF_F_UFO) ++ (NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO) + + /* + * Compute the common dev->feature set available to all slaves. Some +@@ -1164,9 +1163,7 @@ static int bond_compute_features(struct + features &= (slave->dev->features & BOND_INTERSECT_FEATURES); + + if ((features & NETIF_F_SG) && +- !(features & (NETIF_F_IP_CSUM | +- NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM))) ++ !(features & NETIF_F_ALL_CSUM)) + features &= ~NETIF_F_SG; + + /* +@@ -4147,7 +4144,7 @@ static int bond_init(struct net_device * + */ + bond_dev->features |= NETIF_F_VLAN_CHALLENGED; + +- /* don''t acquire bond device''s xmit_lock when ++ /* don''t acquire bond device''s netif_tx_lock when + * transmitting */ + bond_dev->features |= NETIF_F_LLTX; + +diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c +index 30ff8ea..7b7d360 100644 +--- a/drivers/net/chelsio/sge.c ++++ b/drivers/net/chelsio/sge.c +@@ -1419,7 +1419,7 @@ int t1_start_xmit(struct sk_buff *skb, s + struct cpl_tx_pkt *cpl; + + #ifdef NETIF_F_TSO +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + int eth_type; + struct cpl_tx_pkt_lso *hdr; + +@@ -1434,7 +1434,7 @@ #ifdef NETIF_F_TSO + hdr->ip_hdr_words = skb->nh.iph->ihl; + hdr->tcp_hdr_words = skb->h.th->doff; + hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type, +- skb_shinfo(skb)->tso_size)); ++ skb_shinfo(skb)->gso_size)); + hdr->len = htonl(skb->len - sizeof(*hdr)); + cpl = (struct cpl_tx_pkt *)hdr; + sge->stats.tx_lso_pkts++; +diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c +index fa29402..681d284 100644 +--- a/drivers/net/e1000/e1000_main.c ++++ b/drivers/net/e1000/e1000_main.c +@@ -2526,7 +2526,7 @@ #ifdef NETIF_F_TSO + uint8_t ipcss, ipcso, tucss, tucso, hdr_len; + int err; + +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + if (skb_header_cloned(skb)) { + err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + if (err) +@@ -2534,7 +2534,7 @@ #ifdef NETIF_F_TSO + } + + hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2)); +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + if (skb->protocol == ntohs(ETH_P_IP)) { + skb->nh.iph->tot_len = 0; + skb->nh.iph->check = 0; +@@ -2651,7 +2651,7 @@ #ifdef NETIF_F_TSO + * tso gets written back prematurely before the data is fully + * DMAd to the controller */ + if (!skb->data_len && tx_ring->last_tx_tso && +- !skb_shinfo(skb)->tso_size) { ++ !skb_shinfo(skb)->gso_size) { + tx_ring->last_tx_tso = 0; + size -= 4; + } +@@ -2893,7 +2893,7 @@ #endif + } + + #ifdef NETIF_F_TSO +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + /* The controller does a simple calculation to + * make sure there is enough room in the FIFO before + * initiating the DMA for each buffer. The calc is: +@@ -2935,7 +2935,7 @@ #endif + #ifdef NETIF_F_TSO + /* Controller Erratum workaround */ + if (!skb->data_len && tx_ring->last_tx_tso && +- !skb_shinfo(skb)->tso_size) ++ !skb_shinfo(skb)->gso_size) + count++; + #endif + +diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c +index 3682ec6..c35f16e 100644 +--- a/drivers/net/forcedeth.c ++++ b/drivers/net/forcedeth.c +@@ -482,9 +482,9 @@ #define LPA_1000HALF 0x0400 + * critical parts: + * - rx is (pseudo-) lockless: it relies on the single-threading provided + * by the arch code for interrupts. +- * - tx setup is lockless: it relies on dev->xmit_lock. Actual submission ++ * - tx setup is lockless: it relies on netif_tx_lock. Actual submission + * needs dev->priv->lock :-( +- * - set_multicast_list: preparation lockless, relies on dev->xmit_lock. ++ * - set_multicast_list: preparation lockless, relies on netif_tx_lock. + */ + + /* in dev: base, irq */ +@@ -1016,7 +1016,7 @@ static void drain_ring(struct net_device + + /* + * nv_start_xmit: dev->hard_start_xmit function +- * Called with dev->xmit_lock held. ++ * Called with netif_tx_lock held. + */ + static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev) + { +@@ -1105,8 +1105,8 @@ static int nv_start_xmit(struct sk_buff + np->tx_skbuff[nr] = skb; + + #ifdef NETIF_F_TSO +- if (skb_shinfo(skb)->tso_size) +- tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->tso_size << NV_TX2_TSO_SHIFT); ++ if (skb_shinfo(skb)->gso_size) ++ tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->gso_size << NV_TX2_TSO_SHIFT); + else + #endif + tx_flags_extra = (skb->ip_summed == CHECKSUM_HW ? (NV_TX2_CHECKSUM_L3|NV_TX2_CHECKSUM_L4) : 0); +@@ -1203,7 +1203,7 @@ static void nv_tx_done(struct net_device + + /* + * nv_tx_timeout: dev->tx_timeout function +- * Called with dev->xmit_lock held. ++ * Called with netif_tx_lock held. + */ + static void nv_tx_timeout(struct net_device *dev) + { +@@ -1524,7 +1524,7 @@ static int nv_change_mtu(struct net_devi + * Changing the MTU is a rare event, it shouldn''t matter. + */ + disable_irq(dev->irq); +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + spin_lock(&np->lock); + /* stop engines */ + nv_stop_rx(dev); +@@ -1559,7 +1559,7 @@ static int nv_change_mtu(struct net_devi + nv_start_rx(dev); + nv_start_tx(dev); + spin_unlock(&np->lock); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + enable_irq(dev->irq); + } + return 0; +@@ -1594,7 +1594,7 @@ static int nv_set_mac_address(struct net + memcpy(dev->dev_addr, macaddr->sa_data, ETH_ALEN); + + if (netif_running(dev)) { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + spin_lock_irq(&np->lock); + + /* stop rx engine */ +@@ -1606,7 +1606,7 @@ static int nv_set_mac_address(struct net + /* restart rx engine */ + nv_start_rx(dev); + spin_unlock_irq(&np->lock); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } else { + nv_copy_mac_to_hw(dev); + } +@@ -1615,7 +1615,7 @@ static int nv_set_mac_address(struct net + + /* + * nv_set_multicast: dev->set_multicast function +- * Called with dev->xmit_lock held. ++ * Called with netif_tx_lock held. + */ + static void nv_set_multicast(struct net_device *dev) + { +diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c +index 102c1f0..d12605f 100644 +--- a/drivers/net/hamradio/6pack.c ++++ b/drivers/net/hamradio/6pack.c +@@ -308,9 +308,9 @@ static int sp_set_mac_address(struct net + { + struct sockaddr_ax25 *sa = addr; + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + return 0; + } +@@ -767,9 +767,9 @@ static int sixpack_ioctl(struct tty_stru + break; + } + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, &addr, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + err = 0; + break; +diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c +index dc5e9d5..5c66f5a 100644 +--- a/drivers/net/hamradio/mkiss.c ++++ b/drivers/net/hamradio/mkiss.c +@@ -357,9 +357,9 @@ static int ax_set_mac_address(struct net + { + struct sockaddr_ax25 *sa = addr; + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + return 0; + } +@@ -886,9 +886,9 @@ static int mkiss_ioctl(struct tty_struct + break; + } + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, addr, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + err = 0; + break; +diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c +index 31fb2d7..2e222ef 100644 +--- a/drivers/net/ifb.c ++++ b/drivers/net/ifb.c +@@ -76,13 +76,13 @@ static void ri_tasklet(unsigned long dev + dp->st_task_enter++; + if ((skb = skb_peek(&dp->tq)) == NULL) { + dp->st_txq_refl_try++; +- if (spin_trylock(&_dev->xmit_lock)) { ++ if (netif_tx_trylock(_dev)) { + dp->st_rxq_enter++; + while ((skb = skb_dequeue(&dp->rq)) != NULL) { + skb_queue_tail(&dp->tq, skb); + dp->st_rx2tx_tran++; + } +- spin_unlock(&_dev->xmit_lock); ++ netif_tx_unlock(_dev); + } else { + /* reschedule */ + dp->st_rxq_notenter++; +@@ -110,7 +110,7 @@ static void ri_tasklet(unsigned long dev + } + } + +- if (spin_trylock(&_dev->xmit_lock)) { ++ if (netif_tx_trylock(_dev)) { + dp->st_rxq_check++; + if ((skb = skb_peek(&dp->rq)) == NULL) { + dp->tasklet_pending = 0; +@@ -118,10 +118,10 @@ static void ri_tasklet(unsigned long dev + netif_wake_queue(_dev); + } else { + dp->st_rxq_rsch++; +- spin_unlock(&_dev->xmit_lock); ++ netif_tx_unlock(_dev); + goto resched; + } +- spin_unlock(&_dev->xmit_lock); ++ netif_tx_unlock(_dev); + } else { + resched: + dp->tasklet_pending = 1; +diff --git a/drivers/net/irda/vlsi_ir.c b/drivers/net/irda/vlsi_ir.c +index a9f49f0..339d4a7 100644 +--- a/drivers/net/irda/vlsi_ir.c ++++ b/drivers/net/irda/vlsi_ir.c +@@ -959,7 +959,7 @@ static int vlsi_hard_start_xmit(struct s + || (now.tv_sec==ready.tv_sec && now.tv_usec>=ready.tv_usec)) + break; + udelay(100); +- /* must not sleep here - we are called under xmit_lock! */ ++ /* must not sleep here - called under netif_tx_lock! */ + } + } + +diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c +index f9f77e4..bdab369 100644 +--- a/drivers/net/ixgb/ixgb_main.c ++++ b/drivers/net/ixgb/ixgb_main.c +@@ -1163,7 +1163,7 @@ #ifdef NETIF_F_TSO + uint16_t ipcse, tucse, mss; + int err; + +- if(likely(skb_shinfo(skb)->tso_size)) { ++ if(likely(skb_shinfo(skb)->gso_size)) { + if (skb_header_cloned(skb)) { + err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + if (err) +@@ -1171,7 +1171,7 @@ #ifdef NETIF_F_TSO + } + + hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2)); +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + skb->nh.iph->tot_len = 0; + skb->nh.iph->check = 0; + skb->h.th->check = ~csum_tcpudp_magic(skb->nh.iph->saddr, +diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c +index 690a1aa..9bcaa80 100644 +--- a/drivers/net/loopback.c ++++ b/drivers/net/loopback.c +@@ -74,7 +74,7 @@ static void emulate_large_send_offload(s + struct iphdr *iph = skb->nh.iph; + struct tcphdr *th = (struct tcphdr*)(skb->nh.raw + (iph->ihl * 4)); + unsigned int doffset = (iph->ihl + th->doff) * 4; +- unsigned int mtu = skb_shinfo(skb)->tso_size + doffset; ++ unsigned int mtu = skb_shinfo(skb)->gso_size + doffset; + unsigned int offset = 0; + u32 seq = ntohl(th->seq); + u16 id = ntohs(iph->id); +@@ -139,7 +139,7 @@ #ifndef LOOPBACK_MUST_CHECKSUM + #endif + + #ifdef LOOPBACK_TSO +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + BUG_ON(skb->protocol != htons(ETH_P_IP)); + BUG_ON(skb->nh.iph->protocol != IPPROTO_TCP); + +diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c +index c0998ef..0fac9d5 100644 +--- a/drivers/net/mv643xx_eth.c ++++ b/drivers/net/mv643xx_eth.c +@@ -1107,7 +1107,7 @@ static int mv643xx_eth_start_xmit(struct + + #ifdef MV643XX_CHECKSUM_OFFLOAD_TX + if (has_tiny_unaligned_frags(skb)) { +- if ((skb_linearize(skb, GFP_ATOMIC) != 0)) { ++ if (__skb_linearize(skb)) { + stats->tx_dropped++; + printk(KERN_DEBUG "%s: failed to linearize tiny " + "unaligned fragment\n", dev->name); +diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c +index 9d6d254..c9ed624 100644 +--- a/drivers/net/natsemi.c ++++ b/drivers/net/natsemi.c +@@ -323,12 +323,12 @@ performance critical codepaths: + The rx process only runs in the interrupt handler. Access from outside + the interrupt handler is only permitted after disable_irq(). + +-The rx process usually runs under the dev->xmit_lock. If np->intr_tx_reap ++The rx process usually runs under the netif_tx_lock. If np->intr_tx_reap + is set, then access is permitted under spin_lock_irq(&np->lock). + + Thus configuration functions that want to access everything must call + disable_irq(dev->irq); +- spin_lock_bh(dev->xmit_lock); ++ netif_tx_lock_bh(dev); + spin_lock_irq(&np->lock); + + IV. Notes +diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c +index 8cc0d0b..e53b313 100644 +--- a/drivers/net/r8169.c ++++ b/drivers/net/r8169.c +@@ -2171,7 +2171,7 @@ static int rtl8169_xmit_frags(struct rtl + static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev) + { + if (dev->features & NETIF_F_TSO) { +- u32 mss = skb_shinfo(skb)->tso_size; ++ u32 mss = skb_shinfo(skb)->gso_size; + + if (mss) + return LargeSend | ((mss & MSSMask) << MSSShift); +diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c +index b7f00d6..439f45f 100644 +--- a/drivers/net/s2io.c ++++ b/drivers/net/s2io.c +@@ -3522,8 +3522,8 @@ #endif + txdp->Control_1 = 0; + txdp->Control_2 = 0; + #ifdef NETIF_F_TSO +- mss = skb_shinfo(skb)->tso_size; +- if (mss) { ++ mss = skb_shinfo(skb)->gso_size; ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4) { + txdp->Control_1 |= TXD_TCP_LSO_EN; + txdp->Control_1 |= TXD_TCP_LSO_MSS(mss); + } +@@ -3543,10 +3543,10 @@ #endif + } + + frg_len = skb->len - skb->data_len; +- if (skb_shinfo(skb)->ufo_size) { ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) { + int ufo_size; + +- ufo_size = skb_shinfo(skb)->ufo_size; ++ ufo_size = skb_shinfo(skb)->gso_size; + ufo_size &= ~7; + txdp->Control_1 |= TXD_UFO_EN; + txdp->Control_1 |= TXD_UFO_MSS(ufo_size); +@@ -3572,7 +3572,7 @@ #endif + txdp->Host_Control = (unsigned long) skb; + txdp->Control_1 |= TXD_BUFFER0_SIZE(frg_len); + +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + txdp->Control_1 |= TXD_UFO_EN; + + frg_cnt = skb_shinfo(skb)->nr_frags; +@@ -3587,12 +3587,12 @@ #endif + (sp->pdev, frag->page, frag->page_offset, + frag->size, PCI_DMA_TODEVICE); + txdp->Control_1 = TXD_BUFFER0_SIZE(frag->size); +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + txdp->Control_1 |= TXD_UFO_EN; + } + txdp->Control_1 |= TXD_GATHER_CODE_LAST; + +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + frg_cnt++; /* as Txd0 was used for inband header */ + + tx_fifo = mac_control->tx_FIFO_start[queue]; +@@ -3606,7 +3606,7 @@ #ifdef NETIF_F_TSO + if (mss) + val64 |= TX_FIFO_SPECIAL_FUNC; + #endif +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + val64 |= TX_FIFO_SPECIAL_FUNC; + writeq(val64, &tx_fifo->List_Control); + +diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c +index 0618cd5..2a55eb3 100644 +--- a/drivers/net/sky2.c ++++ b/drivers/net/sky2.c +@@ -1125,7 +1125,7 @@ static unsigned tx_le_req(const struct s + count = sizeof(dma_addr_t) / sizeof(u32); + count += skb_shinfo(skb)->nr_frags * count; + +- if (skb_shinfo(skb)->tso_size) ++ if (skb_shinfo(skb)->gso_size) + ++count; + + if (skb->ip_summed == CHECKSUM_HW) +@@ -1197,7 +1197,7 @@ static int sky2_xmit_frame(struct sk_buf + } + + /* Check for TCP Segmentation Offload */ +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + if (mss != 0) { + /* just drop the packet if non-linear expansion fails */ + if (skb_header_cloned(skb) && +diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c +index caf4102..fc9164a 100644 +--- a/drivers/net/tg3.c ++++ b/drivers/net/tg3.c +@@ -3664,7 +3664,7 @@ static int tg3_start_xmit(struct sk_buff + #if TG3_TSO_SUPPORT != 0 + mss = 0; + if (skb->len > (tp->dev->mtu + ETH_HLEN) && +- (mss = skb_shinfo(skb)->tso_size) != 0) { ++ (mss = skb_shinfo(skb)->gso_size) != 0) { + int tcp_opt_len, ip_tcp_len; + + if (skb_header_cloned(skb) && +diff --git a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c +index 5b1af39..11de5af 100644 +--- a/drivers/net/tulip/winbond-840.c ++++ b/drivers/net/tulip/winbond-840.c +@@ -1605,11 +1605,11 @@ #ifdef CONFIG_PM + * - get_stats: + * spin_lock_irq(np->lock), doesn''t touch hw if not present + * - hard_start_xmit: +- * netif_stop_queue + spin_unlock_wait(&dev->xmit_lock); ++ * synchronize_irq + netif_tx_disable; + * - tx_timeout: +- * netif_device_detach + spin_unlock_wait(&dev->xmit_lock); ++ * netif_device_detach + netif_tx_disable; + * - set_multicast_list +- * netif_device_detach + spin_unlock_wait(&dev->xmit_lock); ++ * netif_device_detach + netif_tx_disable; + * - interrupt handler + * doesn''t touch hw if not present, synchronize_irq waits for + * running instances of the interrupt handler. +@@ -1635,11 +1635,10 @@ static int w840_suspend (struct pci_dev + netif_device_detach(dev); + update_csr6(dev, 0); + iowrite32(0, ioaddr + IntrEnable); +- netif_stop_queue(dev); + spin_unlock_irq(&np->lock); + +- spin_unlock_wait(&dev->xmit_lock); + synchronize_irq(dev->irq); ++ netif_tx_disable(dev); + + np->stats.rx_missed_errors += ioread32(ioaddr + RxMissed) & 0xffff; + +diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c +index 4c76cb7..30c48c9 100644 +--- a/drivers/net/typhoon.c ++++ b/drivers/net/typhoon.c +@@ -340,7 +340,7 @@ #define typhoon_synchronize_irq(x) synch + #endif + + #if defined(NETIF_F_TSO) +-#define skb_tso_size(x) (skb_shinfo(x)->tso_size) ++#define skb_tso_size(x) (skb_shinfo(x)->gso_size) + #define TSO_NUM_DESCRIPTORS 2 + #define TSO_OFFLOAD_ON TYPHOON_OFFLOAD_TCP_SEGMENT + #else +diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c +index ed1f837..2eb6b5f 100644 +--- a/drivers/net/via-velocity.c ++++ b/drivers/net/via-velocity.c +@@ -1899,6 +1899,13 @@ static int velocity_xmit(struct sk_buff + + int pktlen = skb->len; + ++#ifdef VELOCITY_ZERO_COPY_SUPPORT ++ if (skb_shinfo(skb)->nr_frags > 6 && __skb_linearize(skb)) { ++ kfree_skb(skb); ++ return 0; ++ } ++#endif ++ + spin_lock_irqsave(&vptr->lock, flags); + + index = vptr->td_curr[qnum]; +@@ -1914,8 +1921,6 @@ static int velocity_xmit(struct sk_buff + */ + if (pktlen < ETH_ZLEN) { + /* Cannot occur until ZC support */ +- if(skb_linearize(skb, GFP_ATOMIC)) +- return 0; + pktlen = ETH_ZLEN; + memcpy(tdinfo->buf, skb->data, skb->len); + memset(tdinfo->buf + skb->len, 0, ETH_ZLEN - skb->len); +@@ -1933,7 +1938,6 @@ #ifdef VELOCITY_ZERO_COPY_SUPPORT + int nfrags = skb_shinfo(skb)->nr_frags; + tdinfo->skb = skb; + if (nfrags > 6) { +- skb_linearize(skb, GFP_ATOMIC); + memcpy(tdinfo->buf, skb->data, skb->len); + tdinfo->skb_dma[0] = tdinfo->buf_dma; + td_ptr->tdesc0.pktsize = +diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c +index 6fd0bf7..75237c1 100644 +--- a/drivers/net/wireless/orinoco.c ++++ b/drivers/net/wireless/orinoco.c +@@ -1835,7 +1835,9 @@ static int __orinoco_program_rids(struct + /* Set promiscuity / multicast*/ + priv->promiscuous = 0; + priv->mc_count = 0; +- __orinoco_set_multicast_list(dev); /* FIXME: what about the xmit_lock */ ++ ++ /* FIXME: what about netif_tx_lock */ ++ __orinoco_set_multicast_list(dev); + + return 0; + } +diff --git a/drivers/s390/net/qeth_eddp.c b/drivers/s390/net/qeth_eddp.c +index 82cb4af..57cec40 100644 +--- a/drivers/s390/net/qeth_eddp.c ++++ b/drivers/s390/net/qeth_eddp.c +@@ -421,7 +421,7 @@ #endif /* CONFIG_QETH_VLAN */ + } + tcph = eddp->skb->h.th; + while (eddp->skb_offset < eddp->skb->len) { +- data_len = min((int)skb_shinfo(eddp->skb)->tso_size, ++ data_len = min((int)skb_shinfo(eddp->skb)->gso_size, + (int)(eddp->skb->len - eddp->skb_offset)); + /* prepare qdio hdr */ + if (eddp->qh.hdr.l2.id == QETH_HEADER_TYPE_LAYER2){ +@@ -516,20 +516,20 @@ qeth_eddp_calc_num_pages(struct qeth_edd + + QETH_DBF_TEXT(trace, 5, "eddpcanp"); + /* can we put multiple skbs in one page? */ +- skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->tso_size + hdr_len); ++ skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->gso_size + hdr_len); + if (skbs_per_page > 1){ +- ctx->num_pages = (skb_shinfo(skb)->tso_segs + 1) / ++ ctx->num_pages = (skb_shinfo(skb)->gso_segs + 1) / + skbs_per_page + 1; + ctx->elements_per_skb = 1; + } else { + /* no -> how many elements per skb? */ +- ctx->elements_per_skb = (skb_shinfo(skb)->tso_size + hdr_len + ++ ctx->elements_per_skb = (skb_shinfo(skb)->gso_size + hdr_len + + PAGE_SIZE) >> PAGE_SHIFT; + ctx->num_pages = ctx->elements_per_skb * +- (skb_shinfo(skb)->tso_segs + 1); ++ (skb_shinfo(skb)->gso_segs + 1); + } + ctx->num_elements = ctx->elements_per_skb * +- (skb_shinfo(skb)->tso_segs + 1); ++ (skb_shinfo(skb)->gso_segs + 1); + } + + static inline struct qeth_eddp_context * +diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c +index dba7f7f..d9cc997 100644 +--- a/drivers/s390/net/qeth_main.c ++++ b/drivers/s390/net/qeth_main.c +@@ -4454,7 +4454,7 @@ qeth_send_packet(struct qeth_card *card, + queue = card->qdio.out_qs + [qeth_get_priority_queue(card, skb, ipv, cast_type)]; + +- if (skb_shinfo(skb)->tso_size) ++ if (skb_shinfo(skb)->gso_size) + large_send = card->options.large_send; + + /*are we able to do TSO ? If so ,prepare and send it from here */ +@@ -4501,7 +4501,7 @@ qeth_send_packet(struct qeth_card *card, + card->stats.tx_packets++; + card->stats.tx_bytes += skb->len; + #ifdef CONFIG_QETH_PERF_STATS +- if (skb_shinfo(skb)->tso_size && ++ if (skb_shinfo(skb)->gso_size && + !(large_send == QETH_LARGE_SEND_NO)) { + card->perf_stats.large_send_bytes += skb->len; + card->perf_stats.large_send_cnt++; +diff --git a/drivers/s390/net/qeth_tso.h b/drivers/s390/net/qeth_tso.h +index 1286dde..89cbf34 100644 +--- a/drivers/s390/net/qeth_tso.h ++++ b/drivers/s390/net/qeth_tso.h +@@ -51,7 +51,7 @@ qeth_tso_fill_header(struct qeth_card *c + hdr->ext.hdr_version = 1; + hdr->ext.hdr_len = 28; + /*insert non-fix values */ +- hdr->ext.mss = skb_shinfo(skb)->tso_size; ++ hdr->ext.mss = skb_shinfo(skb)->gso_size; + hdr->ext.dg_hdr_len = (__u16)(iph->ihl*4 + tcph->doff*4); + hdr->ext.payload_len = (__u16)(skb->len - hdr->ext.dg_hdr_len - + sizeof(struct qeth_hdr_tso)); +diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h +index 93535f0..9269df7 100644 +--- a/include/linux/ethtool.h ++++ b/include/linux/ethtool.h +@@ -408,6 +408,8 @@ #define ETHTOOL_STSO 0x0000001f /* Set + #define ETHTOOL_GPERMADDR 0x00000020 /* Get permanent hardware address */ + #define ETHTOOL_GUFO 0x00000021 /* Get UFO enable (ethtool_value) */ + #define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */ ++#define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */ ++#define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */ + + /* compatibility with older code */ + #define SPARC_ETH_GSET ETHTOOL_GSET +diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h +index 7fda03d..458b278 100644 +--- a/include/linux/netdevice.h ++++ b/include/linux/netdevice.h +@@ -230,7 +230,8 @@ enum netdev_state_t + __LINK_STATE_SCHED, + __LINK_STATE_NOCARRIER, + __LINK_STATE_RX_SCHED, +- __LINK_STATE_LINKWATCH_PENDING ++ __LINK_STATE_LINKWATCH_PENDING, ++ __LINK_STATE_QDISC_RUNNING, + }; + + +@@ -306,9 +307,16 @@ #define NETIF_F_HW_VLAN_TX 128 /* Transm + #define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */ + #define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */ + #define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */ +-#define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */ ++#define NETIF_F_GSO 2048 /* Enable software GSO. */ + #define NETIF_F_LLTX 4096 /* LockLess TX */ +-#define NETIF_F_UFO 8192 /* Can offload UDP Large Send*/ ++ ++ /* Segmentation offload features */ ++#define NETIF_F_GSO_SHIFT 16 ++#define NETIF_F_TSO (SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT) ++#define NETIF_F_UFO (SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT) ++ ++#define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) ++#define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) + + struct net_device *next_sched; + +@@ -394,6 +402,9 @@ #define NETIF_F_UFO 8192 + struct list_head qdisc_list; + unsigned long tx_queue_len; /* Max frames per queue allowed */ + ++ /* Partially transmitted GSO packet. */ ++ struct sk_buff *gso_skb; ++ + /* ingress path synchronizer */ + spinlock_t ingress_lock; + struct Qdisc *qdisc_ingress; +@@ -402,7 +413,7 @@ #define NETIF_F_UFO 8192 + * One part is mostly used on xmit path (device) + */ + /* hard_start_xmit synchronizer */ +- spinlock_t xmit_lock ____cacheline_aligned_in_smp; ++ spinlock_t _xmit_lock ____cacheline_aligned_in_smp; + /* cpu id of processor entered to hard_start_xmit or -1, + if nobody entered there. + */ +@@ -527,6 +538,7 @@ struct packet_type { + struct net_device *, + struct packet_type *, + struct net_device *); ++ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); + void *af_packet_priv; + struct list_head list; + }; +@@ -693,7 +705,8 @@ extern int dev_change_name(struct net_d + extern int dev_set_mtu(struct net_device *, int); + extern int dev_set_mac_address(struct net_device *, + struct sockaddr *); +-extern void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); ++extern int dev_hard_start_xmit(struct sk_buff *skb, ++ struct net_device *dev); + + extern void dev_init(void); + +@@ -900,11 +913,43 @@ static inline void __netif_rx_complete(s + clear_bit(__LINK_STATE_RX_SCHED, &dev->state); + } + ++static inline void netif_tx_lock(struct net_device *dev) ++{ ++ spin_lock(&dev->_xmit_lock); ++ dev->xmit_lock_owner = smp_processor_id(); ++} ++ ++static inline void netif_tx_lock_bh(struct net_device *dev) ++{ ++ spin_lock_bh(&dev->_xmit_lock); ++ dev->xmit_lock_owner = smp_processor_id(); ++} ++ ++static inline int netif_tx_trylock(struct net_device *dev) ++{ ++ int err = spin_trylock(&dev->_xmit_lock); ++ if (!err) ++ dev->xmit_lock_owner = smp_processor_id(); ++ return err; ++} ++ ++static inline void netif_tx_unlock(struct net_device *dev) ++{ ++ dev->xmit_lock_owner = -1; ++ spin_unlock(&dev->_xmit_lock); ++} ++ ++static inline void netif_tx_unlock_bh(struct net_device *dev) ++{ ++ dev->xmit_lock_owner = -1; ++ spin_unlock_bh(&dev->_xmit_lock); ++} ++ + static inline void netif_tx_disable(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + netif_stop_queue(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ +@@ -932,6 +977,7 @@ extern int netdev_max_backlog; + extern int weight_p; + extern int netdev_set_master(struct net_device *dev, struct net_device *master); + extern int skb_checksum_help(struct sk_buff *skb, int inward); ++extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg); + #ifdef CONFIG_BUG + extern void netdev_rx_csum_fault(struct net_device *dev); + #else +@@ -951,6 +997,13 @@ #endif + + extern void linkwatch_run_queue(void); + ++static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb) ++{ ++ int feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT; ++ return skb_shinfo(skb)->gso_size && ++ (dev->features & feature) != feature; ++} ++ + #endif /* __KERNEL__ */ + + #endif /* _LINUX_DEV_H */ +diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h +index b94d1ad..75b5b93 100644 +--- a/include/net/pkt_sched.h ++++ b/include/net/pkt_sched.h +@@ -218,12 +218,13 @@ extern struct qdisc_rate_table *qdisc_ge + struct rtattr *tab); + extern void qdisc_put_rtab(struct qdisc_rate_table *tab); + +-extern int qdisc_restart(struct net_device *dev); ++extern void __qdisc_run(struct net_device *dev); + + static inline void qdisc_run(struct net_device *dev) + { +- while (!netif_queue_stopped(dev) && qdisc_restart(dev) < 0) +- /* NOTHING */; ++ if (!netif_queue_stopped(dev) && ++ !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, &dev->state)) ++ __qdisc_run(dev); + } + + extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp, +diff --git a/include/net/protocol.h b/include/net/protocol.h +index 6dc5970..650911d 100644 +--- a/include/net/protocol.h ++++ b/include/net/protocol.h +@@ -37,6 +37,7 @@ #define MAX_INET_PROTOS 256 /* Must be + struct net_protocol { + int (*handler)(struct sk_buff *skb); + void (*err_handler)(struct sk_buff *skb, u32 info); ++ struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); + int no_policy; + }; + +diff --git a/include/net/sock.h b/include/net/sock.h +index f63d0d5..a8e8d21 100644 +--- a/include/net/sock.h ++++ b/include/net/sock.h +@@ -1064,9 +1064,13 @@ static inline void sk_setup_caps(struct + { + __sk_dst_set(sk, dst); + sk->sk_route_caps = dst->dev->features; ++ if (sk->sk_route_caps & NETIF_F_GSO) ++ sk->sk_route_caps |= NETIF_F_TSO; + if (sk->sk_route_caps & NETIF_F_TSO) { + if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len) + sk->sk_route_caps &= ~NETIF_F_TSO; ++ else ++ sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; + } + } + +diff --git a/include/net/tcp.h b/include/net/tcp.h +index 77f21c6..40e337c 100644 +--- a/include/net/tcp.h ++++ b/include/net/tcp.h +@@ -552,13 +552,13 @@ #include <net/tcp_ecn.h> + */ + static inline int tcp_skb_pcount(const struct sk_buff *skb) + { +- return skb_shinfo(skb)->tso_segs; ++ return skb_shinfo(skb)->gso_segs; + } + + /* This is valid iff tcp_skb_pcount() > 1. */ + static inline int tcp_skb_mss(const struct sk_buff *skb) + { +- return skb_shinfo(skb)->tso_size; ++ return skb_shinfo(skb)->gso_size; + } + + static inline void tcp_dec_pcount_approx(__u32 *count, +@@ -1063,6 +1063,8 @@ extern struct request_sock_ops tcp_reque + + extern int tcp_v4_destroy_sock(struct sock *sk); + ++extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg); ++ + #ifdef CONFIG_PROC_FS + extern int tcp4_proc_init(void); + extern void tcp4_proc_exit(void); +diff --git a/net/atm/clip.c b/net/atm/clip.c +index 1842a4e..6dc21a7 100644 +--- a/net/atm/clip.c ++++ b/net/atm/clip.c +@@ -101,7 +101,7 @@ static void unlink_clip_vcc(struct clip_ + printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc); + return; + } +- spin_lock_bh(&entry->neigh->dev->xmit_lock); /* block clip_start_xmit() */ ++ netif_tx_lock_bh(entry->neigh->dev); /* block clip_start_xmit() */ + entry->neigh->used = jiffies; + for (walk = &entry->vccs; *walk; walk = &(*walk)->next) + if (*walk == clip_vcc) { +@@ -125,7 +125,7 @@ static void unlink_clip_vcc(struct clip_ + printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc " + "0x%p)\n",entry,clip_vcc); + out: +- spin_unlock_bh(&entry->neigh->dev->xmit_lock); ++ netif_tx_unlock_bh(entry->neigh->dev); + } + + /* The neighbour entry n->lock is held. */ +diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c +index 0b33a7b..b9e32b6 100644 +--- a/net/bridge/br_device.c ++++ b/net/bridge/br_device.c +@@ -146,9 +146,9 @@ static int br_set_tx_csum(struct net_dev + struct net_bridge *br = netdev_priv(dev); + + if (data) +- br->feature_mask |= NETIF_F_IP_CSUM; ++ br->feature_mask |= NETIF_F_NO_CSUM; + else +- br->feature_mask &= ~NETIF_F_IP_CSUM; ++ br->feature_mask &= ~NETIF_F_ALL_CSUM; + + br_features_recompute(br); + return 0; +@@ -186,5 +186,5 @@ void br_dev_setup(struct net_device *dev + dev->priv_flags = IFF_EBRIDGE; + + dev->features = NETIF_F_SG | NETIF_F_FRAGLIST +- | NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_IP_CSUM; ++ | NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_NO_CSUM; + } +diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c +index 2d24fb4..00b1128 100644 +--- a/net/bridge/br_forward.c ++++ b/net/bridge/br_forward.c +@@ -32,7 +32,7 @@ static inline int should_deliver(const s + int br_dev_queue_push_xmit(struct sk_buff *skb) + { + /* drop mtu oversized packets except tso */ +- if (skb->len > skb->dev->mtu && !skb_shinfo(skb)->tso_size) ++ if (skb->len > skb->dev->mtu && !skb_shinfo(skb)->gso_size) + kfree_skb(skb); + else { + #ifdef CONFIG_BRIDGE_NETFILTER +diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c +index f36b35e..4e9743d 100644 +--- a/net/bridge/br_if.c ++++ b/net/bridge/br_if.c +@@ -385,14 +385,24 @@ void br_features_recompute(struct net_br + struct net_bridge_port *p; + unsigned long features, checksum; + +- features = br->feature_mask &~ NETIF_F_IP_CSUM; +- checksum = br->feature_mask & NETIF_F_IP_CSUM; ++ checksum = br->feature_mask & NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0; ++ features = br->feature_mask & ~NETIF_F_ALL_CSUM; + + list_for_each_entry(p, &br->port_list, list) { +- if (!(p->dev->features +- & (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM))) ++ unsigned long feature = p->dev->features; ++ ++ if (checksum & NETIF_F_NO_CSUM && !(feature & NETIF_F_NO_CSUM)) ++ checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM; ++ if (checksum & NETIF_F_HW_CSUM && !(feature & NETIF_F_HW_CSUM)) ++ checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM; ++ if (!(feature & NETIF_F_IP_CSUM)) + checksum = 0; +- features &= p->dev->features; ++ ++ if (feature & NETIF_F_GSO) ++ feature |= NETIF_F_TSO; ++ feature |= NETIF_F_GSO; ++ ++ features &= feature; + } + + br->dev->features = features | checksum | NETIF_F_LLTX; +diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c +index 9e27373..588207f 100644 +--- a/net/bridge/br_netfilter.c ++++ b/net/bridge/br_netfilter.c +@@ -743,7 +743,7 @@ static int br_nf_dev_queue_xmit(struct s + { + if (skb->protocol == htons(ETH_P_IP) && + skb->len > skb->dev->mtu && +- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size)) ++ !skb_shinfo(skb)->gso_size) + return ip_fragment(skb, br_dev_queue_push_xmit); + else + return br_dev_queue_push_xmit(skb); +diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c +index 05d6085..c57d887 100644 +--- a/net/core/dev_mcast.c ++++ b/net/core/dev_mcast.c +@@ -62,7 +62,7 @@ #include <net/arp.h> + * Device mc lists are changed by bh at least if IPv6 is enabled, + * so that it must be bh protected. + * +- * We block accesses to device mc filters with dev->xmit_lock. ++ * We block accesses to device mc filters with netif_tx_lock. + */ + + /* +@@ -93,9 +93,9 @@ static void __dev_mc_upload(struct net_d + + void dev_mc_upload(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + __dev_mc_upload(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + /* +@@ -107,7 +107,7 @@ int dev_mc_delete(struct net_device *dev + int err = 0; + struct dev_mc_list *dmi, **dmip; + +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + + for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) { + /* +@@ -139,13 +139,13 @@ int dev_mc_delete(struct net_device *dev + */ + __dev_mc_upload(dev); + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return 0; + } + } + err = -ENOENT; + done: +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return err; + } + +@@ -160,7 +160,7 @@ int dev_mc_add(struct net_device *dev, v + + dmi1 = kmalloc(sizeof(*dmi), GFP_ATOMIC); + +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) { + if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 && + dmi->dmi_addrlen == alen) { +@@ -176,7 +176,7 @@ int dev_mc_add(struct net_device *dev, v + } + + if ((dmi = dmi1) == NULL) { +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return -ENOMEM; + } + memcpy(dmi->dmi_addr, addr, alen); +@@ -189,11 +189,11 @@ int dev_mc_add(struct net_device *dev, v + + __dev_mc_upload(dev); + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return 0; + + done: +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + kfree(dmi1); + return err; + } +@@ -204,7 +204,7 @@ done: + + void dev_mc_discard(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + + while (dev->mc_list != NULL) { + struct dev_mc_list *tmp = dev->mc_list; +@@ -215,7 +215,7 @@ void dev_mc_discard(struct net_device *d + } + dev->mc_count = 0; + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + #ifdef CONFIG_PROC_FS +@@ -250,7 +250,7 @@ static int dev_mc_seq_show(struct seq_fi + struct dev_mc_list *m; + struct net_device *dev = v; + +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + for (m = dev->mc_list; m; m = m->next) { + int i; + +@@ -262,7 +262,7 @@ static int dev_mc_seq_show(struct seq_fi + + seq_putc(seq, ''\n''); + } +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return 0; + } + +diff --git a/net/core/ethtool.c b/net/core/ethtool.c +index e6f7610..27ce168 100644 +--- a/net/core/ethtool.c ++++ b/net/core/ethtool.c +@@ -30,7 +30,7 @@ u32 ethtool_op_get_link(struct net_devic + + u32 ethtool_op_get_tx_csum(struct net_device *dev) + { +- return (dev->features & (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM)) != 0; ++ return (dev->features & NETIF_F_ALL_CSUM) != 0; + } + + int ethtool_op_set_tx_csum(struct net_device *dev, u32 data) +@@ -551,9 +551,7 @@ static int ethtool_set_sg(struct net_dev + return -EFAULT; + + if (edata.data && +- !(dev->features & (NETIF_F_IP_CSUM | +- NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM))) ++ !(dev->features & NETIF_F_ALL_CSUM)) + return -EINVAL; + + return __ethtool_set_sg(dev, edata.data); +@@ -591,7 +589,7 @@ static int ethtool_set_tso(struct net_de + + static int ethtool_get_ufo(struct net_device *dev, char __user *useraddr) + { +- struct ethtool_value edata = { ETHTOOL_GTSO }; ++ struct ethtool_value edata = { ETHTOOL_GUFO }; + + if (!dev->ethtool_ops->get_ufo) + return -EOPNOTSUPP; +@@ -600,6 +598,7 @@ static int ethtool_get_ufo(struct net_de + return -EFAULT; + return 0; + } ++ + static int ethtool_set_ufo(struct net_device *dev, char __user *useraddr) + { + struct ethtool_value edata; +@@ -615,6 +614,29 @@ static int ethtool_set_ufo(struct net_de + return dev->ethtool_ops->set_ufo(dev, edata.data); + } + ++static int ethtool_get_gso(struct net_device *dev, char __user *useraddr) ++{ ++ struct ethtool_value edata = { ETHTOOL_GGSO }; ++ ++ edata.data = dev->features & NETIF_F_GSO; ++ if (copy_to_user(useraddr, &edata, sizeof(edata))) ++ return -EFAULT; ++ return 0; ++} ++ ++static int ethtool_set_gso(struct net_device *dev, char __user *useraddr) ++{ ++ struct ethtool_value edata; ++ ++ if (copy_from_user(&edata, useraddr, sizeof(edata))) ++ return -EFAULT; ++ if (edata.data) ++ dev->features |= NETIF_F_GSO; ++ else ++ dev->features &= ~NETIF_F_GSO; ++ return 0; ++} ++ + static int ethtool_self_test(struct net_device *dev, char __user *useraddr) + { + struct ethtool_test test; +@@ -906,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr) + case ETHTOOL_SUFO: + rc = ethtool_set_ufo(dev, useraddr); + break; ++ case ETHTOOL_GGSO: ++ rc = ethtool_get_gso(dev, useraddr); ++ break; ++ case ETHTOOL_SGSO: ++ rc = ethtool_set_gso(dev, useraddr); ++ break; + default: + rc = -EOPNOTSUPP; + } +diff --git a/net/core/netpoll.c b/net/core/netpoll.c +index ea51f8d..ec28d3b 100644 +--- a/net/core/netpoll.c ++++ b/net/core/netpoll.c +@@ -273,24 +273,21 @@ static void netpoll_send_skb(struct netp + + do { + npinfo->tries--; +- spin_lock(&np->dev->xmit_lock); +- np->dev->xmit_lock_owner = smp_processor_id(); ++ netif_tx_lock(np->dev); + + /* + * network drivers do not expect to be called if the queue is + * stopped. + */ + if (netif_queue_stopped(np->dev)) { +- np->dev->xmit_lock_owner = -1; +- spin_unlock(&np->dev->xmit_lock); ++ netif_tx_unlock(np->dev); + netpoll_poll(np); + udelay(50); + continue; + } + + status = np->dev->hard_start_xmit(skb, np->dev); +- np->dev->xmit_lock_owner = -1; +- spin_unlock(&np->dev->xmit_lock); ++ netif_tx_unlock(np->dev); + + /* success */ + if(!status) { +diff --git a/net/core/pktgen.c b/net/core/pktgen.c +index da16f8f..2380347 100644 +--- a/net/core/pktgen.c ++++ b/net/core/pktgen.c +@@ -2582,7 +2582,7 @@ static __inline__ void pktgen_xmit(struc + } + } + +- spin_lock_bh(&odev->xmit_lock); ++ netif_tx_lock_bh(odev); + if (!netif_queue_stopped(odev)) { + + atomic_inc(&(pkt_dev->skb->users)); +@@ -2627,7 +2627,7 @@ retry_now: + pkt_dev->next_tx_ns = 0; + } + +- spin_unlock_bh(&odev->xmit_lock); ++ netif_tx_unlock_bh(odev); + + /* If pkt_dev->count is zero, then run forever */ + if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) { +diff --git a/net/decnet/dn_nsp_in.c b/net/decnet/dn_nsp_in.c +index 44bda85..2e3323a 100644 +--- a/net/decnet/dn_nsp_in.c ++++ b/net/decnet/dn_nsp_in.c +@@ -801,8 +801,7 @@ got_it: + * We linearize everything except data segments here. + */ + if (cb->nsp_flags & ~0x60) { +- if (unlikely(skb_is_nonlinear(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) ++ if (unlikely(skb_linearize(skb))) + goto free_out; + } + +diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c +index 3407f19..a0a25e0 100644 +--- a/net/decnet/dn_route.c ++++ b/net/decnet/dn_route.c +@@ -629,8 +629,7 @@ int dn_route_rcv(struct sk_buff *skb, st + padlen); + + if (flags & DN_RT_PKT_CNTL) { +- if (unlikely(skb_is_nonlinear(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) ++ if (unlikely(skb_linearize(skb))) + goto dump_it; + + switch(flags & DN_RT_CNTL_MSK) { +diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c +index 97c276f..3692db5 100644 +--- a/net/ipv4/af_inet.c ++++ b/net/ipv4/af_inet.c +@@ -68,6 +68,7 @@ + */ + + #include <linux/config.h> ++#include <linux/err.h> + #include <linux/errno.h> + #include <linux/types.h> + #include <linux/socket.h> +@@ -1084,6 +1085,54 @@ int inet_sk_rebuild_header(struct sock * + + EXPORT_SYMBOL(inet_sk_rebuild_header); + ++static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int sg) ++{ ++ struct sk_buff *segs = ERR_PTR(-EINVAL); ++ struct iphdr *iph; ++ struct net_protocol *ops; ++ int proto; ++ int ihl; ++ int id; ++ ++ if (!pskb_may_pull(skb, sizeof(*iph))) ++ goto out; ++ ++ iph = skb->nh.iph; ++ ihl = iph->ihl * 4; ++ if (ihl < sizeof(*iph)) ++ goto out; ++ ++ if (!pskb_may_pull(skb, ihl)) ++ goto out; ++ ++ skb->h.raw = __skb_pull(skb, ihl); ++ iph = skb->nh.iph; ++ id = ntohs(iph->id); ++ proto = iph->protocol & (MAX_INET_PROTOS - 1); ++ segs = ERR_PTR(-EPROTONOSUPPORT); ++ ++ rcu_read_lock(); ++ ops = rcu_dereference(inet_protos[proto]); ++ if (ops && ops->gso_segment) ++ segs = ops->gso_segment(skb, sg); ++ rcu_read_unlock(); ++ ++ if (IS_ERR(segs)) ++ goto out; ++ ++ skb = segs; ++ do { ++ iph = skb->nh.iph; ++ iph->id = htons(id++); ++ iph->tot_len = htons(skb->len - skb->mac_len); ++ iph->check = 0; ++ iph->check = ip_fast_csum(skb->nh.raw, iph->ihl); ++ } while ((skb = skb->next)); ++ ++out: ++ return segs; ++} ++ + #ifdef CONFIG_IP_MULTICAST + static struct net_protocol igmp_protocol = { + .handler = igmp_rcv, +@@ -1093,6 +1142,7 @@ #endif + static struct net_protocol tcp_protocol = { + .handler = tcp_v4_rcv, + .err_handler = tcp_v4_err, ++ .gso_segment = tcp_tso_segment, + .no_policy = 1, + }; + +@@ -1138,6 +1188,7 @@ static int ipv4_proc_init(void); + static struct packet_type ip_packet_type = { + .type = __constant_htons(ETH_P_IP), + .func = ip_rcv, ++ .gso_segment = inet_gso_segment, + }; + + static int __init inet_init(void) +diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c +index 8dcba38..19c3c73 100644 +--- a/net/ipv4/ip_output.c ++++ b/net/ipv4/ip_output.c +@@ -210,8 +210,7 @@ #if defined(CONFIG_NETFILTER) && defined + return dst_output(skb); + } + #endif +- if (skb->len > dst_mtu(skb->dst) && +- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size)) ++ if (skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size) + return ip_fragment(skb, ip_finish_output2); + else + return ip_finish_output2(skb); +@@ -362,7 +361,7 @@ packet_routed: + } + + ip_select_ident_more(iph, &rt->u.dst, sk, +- (skb_shinfo(skb)->tso_segs ?: 1) - 1); ++ (skb_shinfo(skb)->gso_segs ?: 1) - 1); + + /* Add an IP checksum. */ + ip_send_check(iph); +@@ -743,7 +742,8 @@ static inline int ip_ufo_append_data(str + (length - transhdrlen)); + if (!err) { + /* specify the length of each IP datagram fragment*/ +- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen); ++ skb_shinfo(skb)->gso_size = mtu - fragheaderlen; ++ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4; + __skb_queue_tail(&sk->sk_write_queue, skb); + + return 0; +@@ -839,7 +839,7 @@ int ip_append_data(struct sock *sk, + */ + if (transhdrlen && + length + fragheaderlen <= mtu && +- rt->u.dst.dev->features&(NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM) && ++ rt->u.dst.dev->features & NETIF_F_ALL_CSUM && + !exthdrlen) + csummode = CHECKSUM_HW; + +@@ -1086,14 +1086,16 @@ ssize_t ip_append_page(struct sock *sk, + + inet->cork.length += size; + if ((sk->sk_protocol == IPPROTO_UDP) && +- (rt->u.dst.dev->features & NETIF_F_UFO)) +- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen); ++ (rt->u.dst.dev->features & NETIF_F_UFO)) { ++ skb_shinfo(skb)->gso_size = mtu - fragheaderlen; ++ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4; ++ } + + + while (size > 0) { + int i; + +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_size) + len = size; + else { + +diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c +index d64e2ec..7494823 100644 +--- a/net/ipv4/ipcomp.c ++++ b/net/ipv4/ipcomp.c +@@ -84,7 +84,7 @@ static int ipcomp_input(struct xfrm_stat + struct xfrm_decap_state *decap, struct sk_buff *skb) + { + u8 nexthdr; +- int err = 0; ++ int err = -ENOMEM; + struct iphdr *iph; + union { + struct iphdr iph; +@@ -92,11 +92,8 @@ static int ipcomp_input(struct xfrm_stat + } tmp_iph; + + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { +- err = -ENOMEM; ++ if (skb_linearize_cow(skb)) + goto out; +- } + + skb->ip_summed = CHECKSUM_NONE; + +@@ -171,10 +168,8 @@ static int ipcomp_output(struct xfrm_sta + goto out_ok; + } + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { ++ if (skb_linearize_cow(skb)) + goto out_ok; +- } + + err = ipcomp_compress(x, skb); + iph = skb->nh.iph; +diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c +index 00aa80e..de0753f 100644 +--- a/net/ipv4/tcp.c ++++ b/net/ipv4/tcp.c +@@ -257,6 +257,7 @@ #include <linux/smp_lock.h> + #include <linux/fs.h> + #include <linux/random.h> + #include <linux/bootmem.h> ++#include <linux/err.h> + + #include <net/icmp.h> + #include <net/tcp.h> +@@ -570,7 +571,7 @@ new_segment: + skb->ip_summed = CHECKSUM_HW; + tp->write_seq += copy; + TCP_SKB_CB(skb)->end_seq += copy; +- skb_shinfo(skb)->tso_segs = 0; ++ skb_shinfo(skb)->gso_segs = 0; + + if (!copied) + TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH; +@@ -621,14 +622,10 @@ ssize_t tcp_sendpage(struct socket *sock + ssize_t res; + struct sock *sk = sock->sk; + +-#define TCP_ZC_CSUM_FLAGS (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) +- + if (!(sk->sk_route_caps & NETIF_F_SG) || +- !(sk->sk_route_caps & TCP_ZC_CSUM_FLAGS)) ++ !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) + return sock_no_sendpage(sock, page, offset, size, flags); + +-#undef TCP_ZC_CSUM_FLAGS +- + lock_sock(sk); + TCP_CHECK_TIMER(sk); + res = do_tcp_sendpages(sk, &page, offset, size, flags); +@@ -725,9 +722,7 @@ new_segment: + /* + * Check whether we can use HW checksum. + */ +- if (sk->sk_route_caps & +- (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM)) ++ if (sk->sk_route_caps & NETIF_F_ALL_CSUM) + skb->ip_summed = CHECKSUM_HW; + + skb_entail(sk, tp, skb); +@@ -823,7 +818,7 @@ new_segment: + + tp->write_seq += copy; + TCP_SKB_CB(skb)->end_seq += copy; +- skb_shinfo(skb)->tso_segs = 0; ++ skb_shinfo(skb)->gso_segs = 0; + + from += copy; + copied += copy; +@@ -2026,6 +2021,67 @@ int tcp_getsockopt(struct sock *sk, int + } + + ++struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg) ++{ ++ struct sk_buff *segs = ERR_PTR(-EINVAL); ++ struct tcphdr *th; ++ unsigned thlen; ++ unsigned int seq; ++ unsigned int delta; ++ unsigned int oldlen; ++ unsigned int len; ++ ++ if (!pskb_may_pull(skb, sizeof(*th))) ++ goto out; ++ ++ th = skb->h.th; ++ thlen = th->doff * 4; ++ if (thlen < sizeof(*th)) ++ goto out; ++ ++ if (!pskb_may_pull(skb, thlen)) ++ goto out; ++ ++ oldlen = (u16)~skb->len; ++ __skb_pull(skb, thlen); ++ ++ segs = skb_segment(skb, sg); ++ if (IS_ERR(segs)) ++ goto out; ++ ++ len = skb_shinfo(skb)->gso_size; ++ delta = htonl(oldlen + (thlen + len)); ++ ++ skb = segs; ++ th = skb->h.th; ++ seq = ntohl(th->seq); ++ ++ do { ++ th->fin = th->psh = 0; ++ ++ th->check = ~csum_fold(th->check + delta); ++ if (skb->ip_summed != CHECKSUM_HW) ++ th->check = csum_fold(csum_partial(skb->h.raw, thlen, ++ skb->csum)); ++ ++ seq += len; ++ skb = skb->next; ++ th = skb->h.th; ++ ++ th->seq = htonl(seq); ++ th->cwr = 0; ++ } while (skb->next); ++ ++ delta = htonl(oldlen + (skb->tail - skb->h.raw) + skb->data_len); ++ th->check = ~csum_fold(th->check + delta); ++ if (skb->ip_summed != CHECKSUM_HW) ++ th->check = csum_fold(csum_partial(skb->h.raw, thlen, ++ skb->csum)); ++ ++out: ++ return segs; ++} ++ + extern void __skb_cb_too_small_for_tcp(int, int); + extern struct tcp_congestion_ops tcp_reno; + +diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c +index e9a54ae..defe77a 100644 +--- a/net/ipv4/tcp_input.c ++++ b/net/ipv4/tcp_input.c +@@ -1072,7 +1072,7 @@ tcp_sacktag_write_queue(struct sock *sk, + else + pkt_len = (end_seq - + TCP_SKB_CB(skb)->seq); +- if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->tso_size)) ++ if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->gso_size)) + break; + pcount = tcp_skb_pcount(skb); + } +diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c +index 310f2e6..ee01f69 100644 +--- a/net/ipv4/tcp_output.c ++++ b/net/ipv4/tcp_output.c +@@ -497,15 +497,17 @@ static void tcp_set_skb_tso_segs(struct + /* Avoid the costly divide in the normal + * non-TSO case. + */ +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + } else { + unsigned int factor; + + factor = skb->len + (mss_now - 1); + factor /= mss_now; +- skb_shinfo(skb)->tso_segs = factor; +- skb_shinfo(skb)->tso_size = mss_now; ++ skb_shinfo(skb)->gso_segs = factor; ++ skb_shinfo(skb)->gso_size = mss_now; ++ skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + } + } + +@@ -850,7 +852,7 @@ static int tcp_init_tso_segs(struct sock + + if (!tso_segs || + (tso_segs > 1 && +- skb_shinfo(skb)->tso_size != mss_now)) { ++ tcp_skb_mss(skb) != mss_now)) { + tcp_set_skb_tso_segs(sk, skb, mss_now); + tso_segs = tcp_skb_pcount(skb); + } +@@ -1510,8 +1512,9 @@ int tcp_retransmit_skb(struct sock *sk, + tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) { + if (!pskb_trim(skb, 0)) { + TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + skb->ip_summed = CHECKSUM_NONE; + skb->csum = 0; + } +@@ -1716,8 +1719,9 @@ void tcp_send_fin(struct sock *sk) + skb->csum = 0; + TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_FIN); + TCP_SKB_CB(skb)->sacked = 0; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + + /* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */ + TCP_SKB_CB(skb)->seq = tp->write_seq; +@@ -1749,8 +1753,9 @@ void tcp_send_active_reset(struct sock * + skb->csum = 0; + TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_RST); + TCP_SKB_CB(skb)->sacked = 0; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + + /* Send it off. */ + TCP_SKB_CB(skb)->seq = tcp_acceptable_seq(sk, tp); +@@ -1833,8 +1838,9 @@ struct sk_buff * tcp_make_synack(struct + TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn; + TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1; + TCP_SKB_CB(skb)->sacked = 0; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + th->seq = htonl(TCP_SKB_CB(skb)->seq); + th->ack_seq = htonl(tcp_rsk(req)->rcv_isn + 1); + if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */ +@@ -1937,8 +1943,9 @@ int tcp_connect(struct sock *sk) + TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN; + TCP_ECN_send_syn(sk, tp, buff); + TCP_SKB_CB(buff)->sacked = 0; +- skb_shinfo(buff)->tso_segs = 1; +- skb_shinfo(buff)->tso_size = 0; ++ skb_shinfo(buff)->gso_segs = 1; ++ skb_shinfo(buff)->gso_size = 0; ++ skb_shinfo(buff)->gso_type = 0; + buff->csum = 0; + TCP_SKB_CB(buff)->seq = tp->write_seq++; + TCP_SKB_CB(buff)->end_seq = tp->write_seq; +@@ -2042,8 +2049,9 @@ void tcp_send_ack(struct sock *sk) + buff->csum = 0; + TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK; + TCP_SKB_CB(buff)->sacked = 0; +- skb_shinfo(buff)->tso_segs = 1; +- skb_shinfo(buff)->tso_size = 0; ++ skb_shinfo(buff)->gso_segs = 1; ++ skb_shinfo(buff)->gso_size = 0; ++ skb_shinfo(buff)->gso_type = 0; + + /* Send it off, this clears delayed acks for us. */ + TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp); +@@ -2078,8 +2086,9 @@ static int tcp_xmit_probe_skb(struct soc + skb->csum = 0; + TCP_SKB_CB(skb)->flags = TCPCB_FLAG_ACK; + TCP_SKB_CB(skb)->sacked = urgent; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + + /* Use a previous sequence. This should cause the other + * end to send an ack. Don''t queue or clone SKB, just +diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c +index 32ad229..737c1db 100644 +--- a/net/ipv4/xfrm4_output.c ++++ b/net/ipv4/xfrm4_output.c +@@ -9,6 +9,8 @@ + */ + + #include <linux/compiler.h> ++#include <linux/if_ether.h> ++#include <linux/kernel.h> + #include <linux/skbuff.h> + #include <linux/spinlock.h> + #include <linux/netfilter_ipv4.h> +@@ -152,16 +154,10 @@ error_nolock: + goto out_exit; + } + +-static int xfrm4_output_finish(struct sk_buff *skb) ++static int xfrm4_output_finish2(struct sk_buff *skb) + { + int err; + +-#ifdef CONFIG_NETFILTER +- if (!skb->dst->xfrm) { +- IPCB(skb)->flags |= IPSKB_REROUTED; +- return dst_output(skb); +- } +-#endif + while (likely((err = xfrm4_output_one(skb)) == 0)) { + nf_reset(skb); + +@@ -174,7 +170,7 @@ #endif + return dst_output(skb); + + err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL, +- skb->dst->dev, xfrm4_output_finish); ++ skb->dst->dev, xfrm4_output_finish2); + if (unlikely(err != 1)) + break; + } +@@ -182,6 +178,48 @@ #endif + return err; + } + ++static int xfrm4_output_finish(struct sk_buff *skb) ++{ ++ struct sk_buff *segs; ++ ++#ifdef CONFIG_NETFILTER ++ if (!skb->dst->xfrm) { ++ IPCB(skb)->flags |= IPSKB_REROUTED; ++ return dst_output(skb); ++ } ++#endif ++ ++ if (!skb_shinfo(skb)->gso_size) ++ return xfrm4_output_finish2(skb); ++ ++ skb->protocol = htons(ETH_P_IP); ++ segs = skb_gso_segment(skb, 0); ++ kfree_skb(skb); ++ if (unlikely(IS_ERR(segs))) ++ return PTR_ERR(segs); ++ ++ do { ++ struct sk_buff *nskb = segs->next; ++ int err; ++ ++ segs->next = NULL; ++ err = xfrm4_output_finish2(segs); ++ ++ if (unlikely(err)) { ++ while ((segs = nskb)) { ++ nskb = segs->next; ++ segs->next = NULL; ++ kfree_skb(segs); ++ } ++ return err; ++ } ++ ++ segs = nskb; ++ } while (segs); ++ ++ return 0; ++} ++ + int xfrm4_output(struct sk_buff *skb) + { + return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev, +diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c +index 5bf70b1..cf5d17e 100644 +--- a/net/ipv6/ip6_output.c ++++ b/net/ipv6/ip6_output.c +@@ -147,7 +147,7 @@ static int ip6_output2(struct sk_buff *s + + int ip6_output(struct sk_buff *skb) + { +- if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->ufo_size) || ++ if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size) || + dst_allfrag(skb->dst)) + return ip6_fragment(skb, ip6_output2); + else +@@ -829,8 +829,9 @@ static inline int ip6_ufo_append_data(st + struct frag_hdr fhdr; + + /* specify the length of each IP datagram fragment*/ +- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen) - +- sizeof(struct frag_hdr); ++ skb_shinfo(skb)->gso_size = mtu - fragheaderlen - ++ sizeof(struct frag_hdr); ++ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4; + ipv6_select_ident(skb, &fhdr); + skb_shinfo(skb)->ip6_frag_id = fhdr.identification; + __skb_queue_tail(&sk->sk_write_queue, skb); +diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c +index d511a88..ef56d5d 100644 +--- a/net/ipv6/ipcomp6.c ++++ b/net/ipv6/ipcomp6.c +@@ -64,7 +64,7 @@ static LIST_HEAD(ipcomp6_tfms_list); + + static int ipcomp6_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb) + { +- int err = 0; ++ int err = -ENOMEM; + u8 nexthdr = 0; + int hdr_len = skb->h.raw - skb->nh.raw; + unsigned char *tmp_hdr = NULL; +@@ -75,11 +75,8 @@ static int ipcomp6_input(struct xfrm_sta + struct crypto_tfm *tfm; + int cpu; + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { +- err = -ENOMEM; ++ if (skb_linearize_cow(skb)) + goto out; +- } + + skb->ip_summed = CHECKSUM_NONE; + +@@ -158,10 +155,8 @@ static int ipcomp6_output(struct xfrm_st + goto out_ok; + } + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { ++ if (skb_linearize_cow(skb)) + goto out_ok; +- } + + /* compression */ + plen = skb->len - hdr_len; +diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c +index 8024217..39bdeec 100644 +--- a/net/ipv6/xfrm6_output.c ++++ b/net/ipv6/xfrm6_output.c +@@ -151,7 +151,7 @@ error_nolock: + goto out_exit; + } + +-static int xfrm6_output_finish(struct sk_buff *skb) ++static int xfrm6_output_finish2(struct sk_buff *skb) + { + int err; + +@@ -167,7 +167,7 @@ static int xfrm6_output_finish(struct sk + return dst_output(skb); + + err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL, +- skb->dst->dev, xfrm6_output_finish); ++ skb->dst->dev, xfrm6_output_finish2); + if (unlikely(err != 1)) + break; + } +@@ -175,6 +175,41 @@ static int xfrm6_output_finish(struct sk + return err; + } + ++static int xfrm6_output_finish(struct sk_buff *skb) ++{ ++ struct sk_buff *segs; ++ ++ if (!skb_shinfo(skb)->gso_size) ++ return xfrm6_output_finish2(skb); ++ ++ skb->protocol = htons(ETH_P_IP); ++ segs = skb_gso_segment(skb, 0); ++ kfree_skb(skb); ++ if (unlikely(IS_ERR(segs))) ++ return PTR_ERR(segs); ++ ++ do { ++ struct sk_buff *nskb = segs->next; ++ int err; ++ ++ segs->next = NULL; ++ err = xfrm6_output_finish2(segs); ++ ++ if (unlikely(err)) { ++ while ((segs = nskb)) { ++ nskb = segs->next; ++ segs->next = NULL; ++ kfree_skb(segs); ++ } ++ return err; ++ } ++ ++ segs = nskb; ++ } while (segs); ++ ++ return 0; ++} ++ + int xfrm6_output(struct sk_buff *skb) + { + return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev, +diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c +index 99ceb91..28c9efd 100644 +--- a/net/sched/sch_generic.c ++++ b/net/sched/sch_generic.c +@@ -72,9 +72,9 @@ void qdisc_unlock_tree(struct net_device + dev->queue_lock serializes queue accesses for this device + AND dev->qdisc pointer itself. + +- dev->xmit_lock serializes accesses to device driver. ++ netif_tx_lock serializes accesses to device driver. + +- dev->queue_lock and dev->xmit_lock are mutually exclusive, ++ dev->queue_lock and netif_tx_lock are mutually exclusive, + if one is grabbed, another must be free. + */ + +@@ -90,14 +90,17 @@ void qdisc_unlock_tree(struct net_device + NOTE: Called under dev->queue_lock with locally disabled BH. + */ + +-int qdisc_restart(struct net_device *dev) ++static inline int qdisc_restart(struct net_device *dev) + { + struct Qdisc *q = dev->qdisc; + struct sk_buff *skb; + + /* Dequeue packet */ +- if ((skb = q->dequeue(q)) != NULL) { ++ if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) { + unsigned nolock = (dev->features & NETIF_F_LLTX); ++ ++ dev->gso_skb = NULL; ++ + /* + * When the driver has LLTX set it does its own locking + * in start_xmit. No need to add additional overhead by +@@ -108,7 +111,7 @@ int qdisc_restart(struct net_device *dev + * will be requeued. + */ + if (!nolock) { +- if (!spin_trylock(&dev->xmit_lock)) { ++ if (!netif_tx_trylock(dev)) { + collision: + /* So, someone grabbed the driver. */ + +@@ -126,8 +129,6 @@ int qdisc_restart(struct net_device *dev + __get_cpu_var(netdev_rx_stat).cpu_collision++; + goto requeue; + } +- /* Remember that the driver is grabbed by us. */ +- dev->xmit_lock_owner = smp_processor_id(); + } + + { +@@ -136,14 +137,11 @@ int qdisc_restart(struct net_device *dev + + if (!netif_queue_stopped(dev)) { + int ret; +- if (netdev_nit) +- dev_queue_xmit_nit(skb, dev); + +- ret = dev->hard_start_xmit(skb, dev); ++ ret = dev_hard_start_xmit(skb, dev); + if (ret == NETDEV_TX_OK) { + if (!nolock) { +- dev->xmit_lock_owner = -1; +- spin_unlock(&dev->xmit_lock); ++ netif_tx_unlock(dev); + } + spin_lock(&dev->queue_lock); + return -1; +@@ -157,8 +155,7 @@ int qdisc_restart(struct net_device *dev + /* NETDEV_TX_BUSY - we need to requeue */ + /* Release the driver */ + if (!nolock) { +- dev->xmit_lock_owner = -1; +- spin_unlock(&dev->xmit_lock); ++ netif_tx_unlock(dev); + } + spin_lock(&dev->queue_lock); + q = dev->qdisc; +@@ -175,7 +172,10 @@ int qdisc_restart(struct net_device *dev + */ + + requeue: +- q->ops->requeue(skb, q); ++ if (skb->next) ++ dev->gso_skb = skb; ++ else ++ q->ops->requeue(skb, q); + netif_schedule(dev); + return 1; + } +@@ -183,11 +183,23 @@ requeue: + return q->q.qlen; + } + ++void __qdisc_run(struct net_device *dev) ++{ ++ if (unlikely(dev->qdisc == &noop_qdisc)) ++ goto out; ++ ++ while (qdisc_restart(dev) < 0 && !netif_queue_stopped(dev)) ++ /* NOTHING */; ++ ++out: ++ clear_bit(__LINK_STATE_QDISC_RUNNING, &dev->state); ++} ++ + static void dev_watchdog(unsigned long arg) + { + struct net_device *dev = (struct net_device *)arg; + +- spin_lock(&dev->xmit_lock); ++ netif_tx_lock(dev); + if (dev->qdisc != &noop_qdisc) { + if (netif_device_present(dev) && + netif_running(dev) && +@@ -201,7 +213,7 @@ static void dev_watchdog(unsigned long a + dev_hold(dev); + } + } +- spin_unlock(&dev->xmit_lock); ++ netif_tx_unlock(dev); + + dev_put(dev); + } +@@ -225,17 +237,17 @@ void __netdev_watchdog_up(struct net_dev + + static void dev_watchdog_up(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + __netdev_watchdog_up(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + static void dev_watchdog_down(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + if (del_timer(&dev->watchdog_timer)) + __dev_put(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + void netif_carrier_on(struct net_device *dev) +@@ -577,10 +589,17 @@ void dev_deactivate(struct net_device *d + + dev_watchdog_down(dev); + +- while (test_bit(__LINK_STATE_SCHED, &dev->state)) ++ /* Wait for outstanding dev_queue_xmit calls. */ ++ synchronize_rcu(); ++ ++ /* Wait for outstanding qdisc_run calls. */ ++ while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state)) + yield(); + +- spin_unlock_wait(&dev->xmit_lock); ++ if (dev->gso_skb) { ++ kfree_skb(dev->gso_skb); ++ dev->gso_skb = NULL; ++ } + } + + void dev_init_scheduler(struct net_device *dev) +@@ -622,6 +641,5 @@ EXPORT_SYMBOL(qdisc_create_dflt); + EXPORT_SYMBOL(qdisc_alloc); + EXPORT_SYMBOL(qdisc_destroy); + EXPORT_SYMBOL(qdisc_reset); +-EXPORT_SYMBOL(qdisc_restart); + EXPORT_SYMBOL(qdisc_lock_tree); + EXPORT_SYMBOL(qdisc_unlock_tree); +diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c +index 79b8ef3..4c16ad5 100644 +--- a/net/sched/sch_teql.c ++++ b/net/sched/sch_teql.c +@@ -302,20 +302,17 @@ restart: + + switch (teql_resolve(skb, skb_res, slave)) { + case 0: +- if (spin_trylock(&slave->xmit_lock)) { +- slave->xmit_lock_owner = smp_processor_id(); ++ if (netif_tx_trylock(slave)) { + if (!netif_queue_stopped(slave) && + slave->hard_start_xmit(skb, slave) == 0) { +- slave->xmit_lock_owner = -1; +- spin_unlock(&slave->xmit_lock); ++ netif_tx_unlock(slave); + master->slaves = NEXT_SLAVE(q); + netif_wake_queue(dev); + master->stats.tx_packets++; + master->stats.tx_bytes += len; + return 0; + } +- slave->xmit_lock_owner = -1; +- spin_unlock(&slave->xmit_lock); ++ netif_tx_unlock(slave); + } + if (netif_queue_stopped(dev)) + busy = 1; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: This patch is yet to be merged upstream. [NET]: Added GSO header verification When GSO packets come from an untrusted source (e.g., a Xen guest domain), we need to verify the header integrity before passing it to the hardware. Since the first step in GSO is to verify the header, we can reuse that code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this bit set can only be fed directly to devices with the corresponding bit NETIF_F_GSO_ROBUST. If the device doesn''t have that bit, then the skb is fed to the GSO engine which will allow the packet to be sent to the hardware if it passes the header check. This patch changes the sg flag to a full features flag. The same method can be used to implement TSO ECN support. We simply have to mark packets with CWR set with SKB_GSO_ECN so that only hardware with a corresponding NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment the packet, or segment the first MTU and pass the rest to the hardware for further segmentation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r b255efd0df72 -r f328329eb465 linux-2.6-xen-sparse/include/linux/skbuff.h --- a/linux-2.6-xen-sparse/include/linux/skbuff.h Tue Jun 27 21:27:13 2006 +1000 +++ b/linux-2.6-xen-sparse/include/linux/skbuff.h Tue Jun 27 21:29:42 2006 +1000 @@ -172,6 +172,9 @@ enum { enum { SKB_GSO_TCPV4 = 1 << 0, SKB_GSO_UDPV4 = 1 << 1, + + /* This indicates the skb is from an untrusted source. */ + SKB_GSO_DODGY = 1 << 2, }; /** @@ -1285,7 +1288,7 @@ extern void skb_split(struct sk_b struct sk_buff *skb1, const u32 len); extern void skb_release_data(struct sk_buff *skb); -extern struct sk_buff *skb_segment(struct sk_buff *skb, int sg); +extern struct sk_buff *skb_segment(struct sk_buff *skb, int features); static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) diff -r b255efd0df72 -r f328329eb465 linux-2.6-xen-sparse/net/core/dev.c --- a/linux-2.6-xen-sparse/net/core/dev.c Tue Jun 27 21:27:13 2006 +1000 +++ b/linux-2.6-xen-sparse/net/core/dev.c Tue Jun 27 21:29:42 2006 +1000 @@ -1116,11 +1116,14 @@ out: /** * skb_gso_segment - Perform segmentation on skb. * @skb: buffer to segment - * @sg: whether scatter-gather is supported on the target. + * @features: features for the output path (see dev->features) * * This function segments the given skb and returns a list of segments. - */ -struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg) + * + * It may return NULL if the skb requires no segmentation. This is + * only possible when GSO is used for verifying header integrity. + */ +struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features) { struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); struct packet_type *ptype; @@ -1136,11 +1139,13 @@ struct sk_buff *skb_gso_segment(struct s rcu_read_lock(); list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) { if (ptype->type == type && !ptype->dev && ptype->gso_segment) { - segs = ptype->gso_segment(skb, sg); + segs = ptype->gso_segment(skb, features); break; } } rcu_read_unlock(); + + __skb_push(skb, skb->data - skb->mac.raw); return segs; } @@ -1217,9 +1222,15 @@ static int dev_gso_segment(struct sk_buf { struct net_device *dev = skb->dev; struct sk_buff *segs; - - segs = skb_gso_segment(skb, dev->features & NETIF_F_SG && - !illegal_highdma(dev, skb)); + int features = dev->features & ~(illegal_highdma(dev, skb) ? + NETIF_F_SG : 0); + + segs = skb_gso_segment(skb, features); + + /* Verifying header integrity only. */ + if (!segs) + return 0; + if (unlikely(IS_ERR(segs))) return PTR_ERR(segs); @@ -1236,13 +1247,17 @@ int dev_hard_start_xmit(struct sk_buff * if (netdev_nit) dev_queue_xmit_nit(skb, dev); - if (!netif_needs_gso(dev, skb)) - return dev->hard_start_xmit(skb, dev); - - if (unlikely(dev_gso_segment(skb))) - goto out_kfree_skb; - } - + if (netif_needs_gso(dev, skb)) { + if (unlikely(dev_gso_segment(skb))) + goto out_kfree_skb; + if (skb->next) + goto gso; + } + + return dev->hard_start_xmit(skb, dev); + } + +gso: do { struct sk_buff *nskb = skb->next; int rc; diff -r b255efd0df72 -r f328329eb465 linux-2.6-xen-sparse/net/core/skbuff.c --- a/linux-2.6-xen-sparse/net/core/skbuff.c Tue Jun 27 21:27:13 2006 +1000 +++ b/linux-2.6-xen-sparse/net/core/skbuff.c Tue Jun 27 21:29:42 2006 +1000 @@ -1804,13 +1804,13 @@ int skb_append_datato_frags(struct sock /** * skb_segment - Perform protocol segmentation on skb. * @skb: buffer to segment - * @sg: whether scatter-gather can be used for generated segments + * @features: features for the output path (see dev->features) * * This function performs segmentation on the given skb. It returns * the segment at the given position. It returns NULL if there are * no more segments to generate, or when an error is encountered. */ -struct sk_buff *skb_segment(struct sk_buff *skb, int sg) +struct sk_buff *skb_segment(struct sk_buff *skb, int features) { struct sk_buff *segs = NULL; struct sk_buff *tail = NULL; @@ -1819,6 +1819,7 @@ struct sk_buff *skb_segment(struct sk_bu unsigned int offset = doffset; unsigned int headroom; unsigned int len; + int sg = features & NETIF_F_SG; int nfrags = skb_shinfo(skb)->nr_frags; int err = -ENOMEM; int i = 0; diff -r b255efd0df72 -r f328329eb465 patches/linux-2.6.16.13/net-gso.patch --- a/patches/linux-2.6.16.13/net-gso.patch Tue Jun 27 21:27:13 2006 +1000 +++ b/patches/linux-2.6.16.13/net-gso.patch Tue Jun 27 21:29:42 2006 +1000 @@ -2283,3 +2283,163 @@ index 79b8ef3..4c16ad5 100644 } if (netif_queue_stopped(dev)) busy = 1; +diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h +index 458b278..58e5bae 100644 +--- a/include/linux/netdevice.h ++++ b/include/linux/netdevice.h +@@ -314,6 +314,7 @@ #define NETIF_F_LLTX 4096 /* LockLess T + #define NETIF_F_GSO_SHIFT 16 + #define NETIF_F_TSO (SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT) + #define NETIF_F_UFO (SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT) ++#define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY << NETIF_F_GSO_SHIFT) + + #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) + #define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) +@@ -538,7 +539,8 @@ struct packet_type { + struct net_device *, + struct packet_type *, + struct net_device *); +- struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); ++ struct sk_buff *(*gso_segment)(struct sk_buff *skb, ++ int features); + void *af_packet_priv; + struct list_head list; + }; +@@ -977,7 +979,7 @@ extern int netdev_max_backlog; + extern int weight_p; + extern int netdev_set_master(struct net_device *dev, struct net_device *master); + extern int skb_checksum_help(struct sk_buff *skb, int inward); +-extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int sg); ++extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features); + #ifdef CONFIG_BUG + extern void netdev_rx_csum_fault(struct net_device *dev); + #else +@@ -997,11 +999,16 @@ #endif + + extern void linkwatch_run_queue(void); + ++static inline int skb_gso_ok(struct sk_buff *skb, int features) ++{ ++ int feature = skb_shinfo(skb)->gso_size ? ++ skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT : 0; ++ return (features & feature) != feature; ++} ++ + static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb) + { +- int feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT; +- return skb_shinfo(skb)->gso_size && +- (dev->features & feature) != feature; ++ return skb_gso_ok(skb, dev->features); + } + + #endif /* __KERNEL__ */ +diff --git a/include/net/protocol.h b/include/net/protocol.h +index 650911d..0d2dcdb 100644 +--- a/include/net/protocol.h ++++ b/include/net/protocol.h +@@ -37,7 +37,8 @@ #define MAX_INET_PROTOS 256 /* Must be + struct net_protocol { + int (*handler)(struct sk_buff *skb); + void (*err_handler)(struct sk_buff *skb, u32 info); +- struct sk_buff *(*gso_segment)(struct sk_buff *skb, int sg); ++ struct sk_buff *(*gso_segment)(struct sk_buff *skb, ++ int features); + int no_policy; + }; + +diff --git a/include/net/tcp.h b/include/net/tcp.h +index 40e337c..70e1d5f 100644 +--- a/include/net/tcp.h ++++ b/include/net/tcp.h +@@ -1063,7 +1063,7 @@ extern struct request_sock_ops tcp_reque + + extern int tcp_v4_destroy_sock(struct sock *sk); + +-extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg); ++extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features); + + #ifdef CONFIG_PROC_FS + extern int tcp4_proc_init(void); +diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c +index b9e32b6..180e79b 100644 +--- a/net/bridge/br_device.c ++++ b/net/bridge/br_device.c +@@ -185,6 +185,6 @@ void br_dev_setup(struct net_device *dev + dev->set_mac_address = br_set_mac_address; + dev->priv_flags = IFF_EBRIDGE; + +- dev->features = NETIF_F_SG | NETIF_F_FRAGLIST +- | NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_NO_CSUM; ++ dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA | ++ NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST; + } +diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c +index 4e9743d..0617146 100644 +--- a/net/bridge/br_if.c ++++ b/net/bridge/br_if.c +@@ -405,7 +405,8 @@ void br_features_recompute(struct net_br + features &= feature; + } + +- br->dev->features = features | checksum | NETIF_F_LLTX; ++ br->dev->features = features | checksum | NETIF_F_LLTX | ++ NETIF_F_GSO_ROBUST; + } + + /* called with RTNL */ +diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c +index 3692db5..5ba719e 100644 +--- a/net/ipv4/af_inet.c ++++ b/net/ipv4/af_inet.c +@@ -1085,7 +1085,7 @@ int inet_sk_rebuild_header(struct sock * + + EXPORT_SYMBOL(inet_sk_rebuild_header); + +-static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int sg) ++static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features) + { + struct sk_buff *segs = ERR_PTR(-EINVAL); + struct iphdr *iph; +@@ -1114,10 +1114,10 @@ static struct sk_buff *inet_gso_segment( + rcu_read_lock(); + ops = rcu_dereference(inet_protos[proto]); + if (ops && ops->gso_segment) +- segs = ops->gso_segment(skb, sg); ++ segs = ops->gso_segment(skb, features); + rcu_read_unlock(); + +- if (IS_ERR(segs)) ++ if (!segs || unlikely(IS_ERR(segs))) + goto out; + + skb = segs; +diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c +index de0753f..84130c9 100644 +--- a/net/ipv4/tcp.c ++++ b/net/ipv4/tcp.c +@@ -2021,7 +2021,7 @@ int tcp_getsockopt(struct sock *sk, int + } + + +-struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int sg) ++struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) + { + struct sk_buff *segs = ERR_PTR(-EINVAL); + struct tcphdr *th; +@@ -2042,10 +2042,14 @@ struct sk_buff *tcp_tso_segment(struct s + if (!pskb_may_pull(skb, thlen)) + goto out; + ++ segs = NULL; ++ if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) ++ goto out; ++ + oldlen = (u16)~skb->len; + __skb_pull(skb, thlen); + +- segs = skb_segment(skb, sg); ++ segs = skb_segment(skb, features); + if (IS_ERR(segs)) + goto out; + _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] loopback: Added support for TSO Just like SG, TSO support here is innate. So all we need to do is mark it as such. This patch also adds the ethtool control functions for SG. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r f328329eb465 -r c88ee5ecc39c linux-2.6-xen-sparse/drivers/xen/netback/loopback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c Tue Jun 27 21:29:42 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c Tue Jun 27 21:30:08 2006 +1000 @@ -125,6 +125,10 @@ static struct ethtool_ops network_ethtoo { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = ethtool_op_set_tx_csum, + .get_sg = ethtool_op_get_sg, + .set_sg = ethtool_op_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = ethtool_op_set_tso, }; /* @@ -152,6 +156,7 @@ static void loopback_construct(struct ne dev->features = (NETIF_F_HIGHDMA | NETIF_F_LLTX | + NETIF_F_TSO | NETIF_F_SG | NETIF_F_IP_CSUM); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] back: Add TSO support This patch adds TCP Segmentation Offload (TSO) support to the backend. It also advertises this fact through xenbus so that the frontend can detect this and send through TSO requests only if it is supported. This is done using an extra request slot which is indicated by a flag in the first slot. In future checksum offload can be done in the same way. Even though only TSO is supported for now the code actually supports GSO so it can be applied to any other protocol. The only missing bit is the detection of host support for a specific GSO protocol. Once that is added we can advertise all supported protocols to the guest. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r c88ee5ecc39c -r f6d1f558bf1c linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Tue Jun 27 21:30:08 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Tue Jun 27 21:30:11 2006 +1000 @@ -43,7 +43,7 @@ static void netif_idx_release(u16 pendin static void netif_idx_release(u16 pending_idx); static void netif_page_release(struct page *page); static void make_tx_response(netif_t *netif, - u16 id, + netif_tx_request_t *txp, s8 st); static int make_rx_response(netif_t *netif, u16 id, @@ -481,7 +481,7 @@ inline static void net_tx_action_dealloc netif = pending_tx_info[pending_idx].netif; - make_tx_response(netif, pending_tx_info[pending_idx].req.id, + make_tx_response(netif, &pending_tx_info[pending_idx].req, NETIF_RSP_OKAY); pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; @@ -490,14 +490,16 @@ inline static void net_tx_action_dealloc } } -static void netbk_tx_err(netif_t *netif, RING_IDX end) +static void netbk_tx_err(netif_t *netif, netif_tx_request_t *txp, RING_IDX end) { RING_IDX cons = netif->tx.req_cons; do { - netif_tx_request_t *txp = RING_GET_REQUEST(&netif->tx, cons); - make_tx_response(netif, txp->id, NETIF_RSP_ERROR); - } while (++cons < end); + make_tx_response(netif, txp, NETIF_RSP_ERROR); + if (++cons >= end) + break; + txp = RING_GET_REQUEST(&netif->tx, cons); + } while (1); netif->tx.req_cons = cons; netif_schedule_work(netif); netif_put(netif); @@ -508,7 +510,7 @@ static int netbk_count_requests(netif_t { netif_tx_request_t *first = txp; RING_IDX cons = netif->tx.req_cons; - int frags = 1; + int frags = 0; while (txp->flags & NETTXF_more_data) { if (frags >= work_to_do) { @@ -543,7 +545,7 @@ static gnttab_map_grant_ref_t *netbk_get skb_frag_t *frags = shinfo->frags; netif_tx_request_t *txp; unsigned long pending_idx = *((u16 *)skb->data); - RING_IDX cons = netif->tx.req_cons + 1; + RING_IDX cons = netif->tx.req_cons; int i, start; /* Skip first skb fragment if it is on same page as header fragment. */ @@ -581,7 +583,7 @@ static int netbk_tx_check_mop(struct sk_ err = mop->status; if (unlikely(err)) { txp = &pending_tx_info[pending_idx].req; - make_tx_response(netif, txp->id, NETIF_RSP_ERROR); + make_tx_response(netif, txp, NETIF_RSP_ERROR); pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; netif_put(netif); } else { @@ -614,7 +616,7 @@ static int netbk_tx_check_mop(struct sk_ /* Error on this fragment: respond to client with an error. */ txp = &pending_tx_info[pending_idx].req; - make_tx_response(netif, txp->id, NETIF_RSP_ERROR); + make_tx_response(netif, txp, NETIF_RSP_ERROR); pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; netif_put(netif); @@ -668,6 +670,7 @@ static void net_tx_action(unsigned long struct sk_buff *skb; netif_t *netif; netif_tx_request_t txreq; + struct netif_tx_extra txtra; u16 pending_idx; RING_IDX i; gnttab_map_grant_ref_t *mop; @@ -726,22 +729,37 @@ static void net_tx_action(unsigned long } netif->remaining_credit -= txreq.size; + work_to_do--; + netif->tx.req_cons = ++i; + + if (txreq.flags & NETTXF_extra_info) { + if (work_to_do-- <= 0) { + DPRINTK("Missing extra info\n"); + netbk_tx_err(netif, &txreq, i); + continue; + } + + memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i), + sizeof(txtra)); + netif->tx.req_cons = ++i; + } + ret = netbk_count_requests(netif, &txreq, work_to_do); if (unlikely(ret < 0)) { - netbk_tx_err(netif, i - ret); + netbk_tx_err(netif, &txreq, i - ret); continue; } i += ret; if (unlikely(ret > MAX_SKB_FRAGS + 1)) { DPRINTK("Too many frags\n"); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); continue; } if (unlikely(txreq.size < ETH_HLEN)) { DPRINTK("Bad packet size: %d\n", txreq.size); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); continue; } @@ -750,25 +768,31 @@ static void net_tx_action(unsigned long DPRINTK("txreq.offset: %x, size: %u, end: %lu\n", txreq.offset, txreq.size, (txreq.offset &~PAGE_MASK) + txreq.size); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); continue; } pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)]; data_len = (txreq.size > PKT_PROT_LEN && - ret < MAX_SKB_FRAGS + 1) ? + ret < MAX_SKB_FRAGS) ? PKT_PROT_LEN : txreq.size; skb = alloc_skb(data_len+16, GFP_ATOMIC); if (unlikely(skb == NULL)) { DPRINTK("Can''t allocate a skb in start_xmit.\n"); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); break; } /* Packets passed to netif_rx() must have some headroom. */ skb_reserve(skb, 16); + + if (txreq.flags & NETTXF_gso) { + skb_shinfo(skb)->gso_size = txtra.gso_size; + skb_shinfo(skb)->gso_segs = txtra.gso_segs; + skb_shinfo(skb)->gso_type = txtra.gso_type; + } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), GNTMAP_host_map | GNTMAP_readonly, @@ -782,7 +806,7 @@ static void net_tx_action(unsigned long __skb_put(skb, data_len); - skb_shinfo(skb)->nr_frags = ret - 1; + skb_shinfo(skb)->nr_frags = ret; if (data_len < txreq.size) { skb_shinfo(skb)->nr_frags++; skb_shinfo(skb)->frags[0].page @@ -898,7 +922,7 @@ irqreturn_t netif_be_int(int irq, void * } static void make_tx_response(netif_t *netif, - u16 id, + netif_tx_request_t *txp, s8 st) { RING_IDX i = netif->tx.rsp_prod_pvt; @@ -906,8 +930,11 @@ static void make_tx_response(netif_t *ne int notify; resp = RING_GET_RESPONSE(&netif->tx, i); - resp->id = id; + resp->id = txp->id; resp->status = st; + + if (txp->flags & NETTXF_extra_info) + RING_GET_RESPONSE(&netif->tx, ++i)->status = NETIF_RSP_NULL; netif->tx.rsp_prod_pvt = ++i; RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->tx, notify); diff -r c88ee5ecc39c -r f6d1f558bf1c linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Tue Jun 27 21:30:08 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Tue Jun 27 21:30:11 2006 +1000 @@ -101,6 +101,12 @@ static int netback_probe(struct xenbus_d goto abort_transaction; } + err = xenbus_printf(xbt, dev->nodename, "feature-tso", "%d", 1); + if (err) { + message = "writing feature-tso"; + goto abort_transaction; + } + err = xenbus_transaction_end(xbt, 0); } while (err == -EAGAIN); diff -r c88ee5ecc39c -r f6d1f558bf1c xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Tue Jun 27 21:30:08 2006 +1000 +++ b/xen/include/public/io/netif.h Tue Jun 27 21:30:11 2006 +1000 @@ -31,6 +31,13 @@ #define _NETTXF_more_data (2) #define NETTXF_more_data (1U<<_NETTXF_more_data) +/* Packet has GSO fields. */ +#define _NETTXF_gso (3) +#define NETTXF_gso (1U<<_NETTXF_gso) + +/* Packet to be folloed by extra descritptor. */ +#define NETTXF_extra_info (NETTXF_gso) + struct netif_tx_request { grant_ref_t gref; /* Reference to buffer page */ uint16_t offset; /* Offset within buffer page */ @@ -39,6 +46,13 @@ struct netif_tx_request { uint16_t size; /* Packet size in bytes. */ }; typedef struct netif_tx_request netif_tx_request_t; + +/* This structure needs to fit within netif_tx_request for compatibility. */ +struct netif_tx_extra { + uint16_t gso_size; /* GSO MSS. */ + uint16_t gso_segs; /* GSO segment count. */ + uint16_t gso_type; /* GSO type. */ +}; struct netif_tx_response { uint16_t id; @@ -78,6 +92,7 @@ DEFINE_RING_TYPES(netif_rx, struct netif #define NETIF_RSP_DROPPED -2 #define NETIF_RSP_ERROR -1 #define NETIF_RSP_OKAY 0 +#define NETIF_RSP_NULL 1 #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-27 12:09 UTC
[Xen-devel] [5/5] [NET] front: Transmit TSO packets if supported
Hi: [NET] front: Transmit TSO packets if supported This patch adds TSO transmission support to the frontend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r f6d1f558bf1c -r f41f42b29fae linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Tue Jun 27 21:30:11 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Tue Jun 27 21:47:07 2006 +1000 @@ -463,7 +463,7 @@ static int network_open(struct net_devic static inline int netfront_tx_slot_available(struct netfront_info *np) { - return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1; + return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2; } static inline void network_maybe_wake_tx(struct net_device *dev) @@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net rmb(); /* Ensure we see responses up to ''rp''. */ for (cons = np->tx.rsp_cons; cons != prod; cons++) { - id = RING_GET_RESPONSE(&np->tx, cons)->id; + struct netif_tx_response *txrsp; + + txrsp = RING_GET_RESPONSE(&np->tx, cons); + if (txrsp->status == NETIF_RSP_NULL) + continue; + + id = txrsp->id; skb = np->tx_skbs[id]; if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { @@ -739,7 +745,8 @@ static int network_start_xmit(struct sk_ spin_lock_irq(&np->tx_lock); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)))) { + (frags > 1 && !xennet_can_sg(dev)) || + netif_needs_gso(dev, skb))) { spin_unlock_irq(&np->tx_lock); goto drop; } @@ -762,9 +769,21 @@ static int network_start_xmit(struct sk_ tx->size = len; tx->flags = 0; - if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ + if (skb->ip_summed == CHECKSUM_HW) { + /* local packet? */ tx->flags |= NETTXF_csum_blank | NETTXF_data_validated; - if (skb->proto_data_valid) /* remote but checksummed? */ + + if (skb_shinfo(skb)->gso_size) { + struct netif_tx_extra *txtra + (struct netif_tx_extra *) + RING_GET_REQUEST(&np->tx, ++i); + + tx->flags |= NETTXF_gso; + txtra->gso_size = skb_shinfo(skb)->gso_size; + txtra->gso_segs = skb_shinfo(skb)->gso_segs; + txtra->gso_type = skb_shinfo(skb)->gso_type; + } + } else if (skb->proto_data_valid) /* remote but checksummed? */ tx->flags |= NETTXF_data_validated; np->tx.req_prod_pvt = i + 1; @@ -1065,9 +1084,26 @@ static int xennet_set_sg(struct net_devi return ethtool_op_set_sg(dev, data); } +static int xennet_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + struct netfront_info *np = netdev_priv(dev); + int val; + + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso", + "%d", &val) < 0) + val = 0; + if (!val) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + static void xennet_set_features(struct net_device *dev) { - xennet_set_sg(dev, 1); + if (!xennet_set_sg(dev, 1)) + xennet_set_tso(dev, 1); } static void network_connect(struct net_device *dev) @@ -1148,6 +1184,8 @@ static struct ethtool_ops network_ethtoo .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = xennet_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = xennet_set_tso, }; #ifdef CONFIG_SYSFS _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27 Jun 2006, at 13:02, Herbert Xu wrote:> Comparison with base line and jumbo MTU: > > baseline: 1228.08Mb/s > mtu=16436: 3097.49Mb/s > TSO: 3208.41Mb/s > mtu=60040: 3869.36Mb/s > > lo(16436): 5543.91Mb/s > lo(60040): 8544.08Mb/sDo you have any numbers for transmit to external host? It''d be particularly interesting to see how you improve CPU overheads of domain0 and domainU (since throughput isn''t usually much less than native anyway, but CPU overhead can be significantly higher). Comments on the patches: 1. Can you make two separate patches in the patches/ directory -- one for the stuff already accepted upstream and a separate one for the stuff that is not yet accepted. Prefix the patch files with numbers to make sure they get applied in the right order. 2. As well as the patches you provide, some generic Linux files are modified in the sparse tree because we already modify them there. Can you please also place a copy of your changes inside your patches/ files. This will make it easier for us to forward port to 2.6.18 as we can see which changes in the sparse tree actually correspond to the generic GSO changes. 3. Can you give brief documentation, or even just send a an email explaining, how you change the interdomain protocol to add GSO? 4. The change to make_tx_response() prototype in netback you could send us as a separate patch. It would make the real GSO changes to netback smaller and clearer. Thanks! Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Jun 27, 2006 at 02:27:20PM +0100, Keir Fraser wrote:> > Do you have any numbers for transmit to external host? It''d be > particularly interesting to see how you improve CPU overheads of > domain0 and domainU (since throughput isn''t usually much less than > native anyway, but CPU overhead can be significantly higher).Sure. Here are figures from xm top on the same laptop. It only has an e100 so there would be a lot more savings with a TSO-capable NIC: dom0 domU idle: 3.2% 0% TSO: 17.7% 5.9% baseline: 18.6% 6.9% So overall it''s 22.3% vs. 20.4%. In both cases the actual throughput is around 92Mb/s. BTW, the difference here is much smaller than inter-domain traffic because this is purely send-only while with inter-domain traffic we actually have the corresponding benefit of the receive side as well with the super-packet passing through the receive side as is.> Comments on the patches:I will repost tomorrow with your comments taken into consideration. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Keir: OK here they are again with your comments addressed. On Tue, Jun 27, 2006 at 02:27:20PM +0100, Keir Fraser wrote:> > Comments on the patches: > 1. Can you make two separate patches in the patches/ directory -- one > for the stuff already accepted upstream and a separate one for the > stuff that is not yet accepted. Prefix the patch files with numbers to > make sure they get applied in the right order.This is no longer necessary as the header verification patch has now been merged.> 2. As well as the patches you provide, some generic Linux files are > modified in the sparse tree because we already modify them there. Can > you please also place a copy of your changes inside your patches/ > files. This will make it easier for us to forward port to 2.6.18 as we > can see which changes in the sparse tree actually correspond to the > generic GSO changes.Done.> 3. Can you give brief documentation, or even just send a an email > explaining, how you change the interdomain protocol to add GSO?I''ve included it in the TSO backend patch''s description.> 4. The change to make_tx_response() prototype in netback you could > send us as a separate patch. It would make the real GSO changes to > netback smaller and clearer.Done. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET]: Added GSO support Imported GSO patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 1da8f53ce65b -r 6913e0756b81 linux-2.6-xen-sparse/include/linux/skbuff.h --- a/linux-2.6-xen-sparse/include/linux/skbuff.h Tue Jun 27 18:24:08 2006 +0100 +++ b/linux-2.6-xen-sparse/include/linux/skbuff.h Wed Jun 28 13:44:18 2006 +1000 @@ -134,9 +134,10 @@ struct skb_shared_info { struct skb_shared_info { atomic_t dataref; unsigned short nr_frags; - unsigned short tso_size; - unsigned short tso_segs; - unsigned short ufo_size; + unsigned short gso_size; + /* Warning: this field is not always filled in (UFO)! */ + unsigned short gso_segs; + unsigned short gso_type; unsigned int ip6_frag_id; struct sk_buff *frag_list; skb_frag_t frags[MAX_SKB_FRAGS]; @@ -166,6 +167,14 @@ enum { SKB_FCLONE_UNAVAILABLE, SKB_FCLONE_ORIG, SKB_FCLONE_CLONE, +}; + +enum { + SKB_GSO_TCPV4 = 1 << 0, + SKB_GSO_UDPV4 = 1 << 1, + + /* This indicates the skb is from an untrusted source. */ + SKB_GSO_DODGY = 1 << 2, }; /** @@ -1157,18 +1166,34 @@ static inline int skb_can_coalesce(struc return 0; } +static inline int __skb_linearize(struct sk_buff *skb) +{ + return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM; +} + /** * skb_linearize - convert paged skb to linear one * @skb: buffer to linarize - * @gfp: allocation mode * * If there is no free memory -ENOMEM is returned, otherwise zero * is returned and the old skb data released. */ -extern int __skb_linearize(struct sk_buff *skb, gfp_t gfp); -static inline int skb_linearize(struct sk_buff *skb, gfp_t gfp) -{ - return __skb_linearize(skb, gfp); +static inline int skb_linearize(struct sk_buff *skb) +{ + return skb_is_nonlinear(skb) ? __skb_linearize(skb) : 0; +} + +/** + * skb_linearize_cow - make sure skb is linear and writable + * @skb: buffer to process + * + * If there is no free memory -ENOMEM is returned, otherwise zero + * is returned and the old skb data released. + */ +static inline int skb_linearize_cow(struct sk_buff *skb) +{ + return skb_is_nonlinear(skb) || skb_cloned(skb) ? + __skb_linearize(skb) : 0; } /** @@ -1263,6 +1288,7 @@ extern void skb_split(struct sk_b struct sk_buff *skb1, const u32 len); extern void skb_release_data(struct sk_buff *skb); +extern struct sk_buff *skb_segment(struct sk_buff *skb, int features); static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) diff -r 1da8f53ce65b -r 6913e0756b81 linux-2.6-xen-sparse/net/core/dev.c --- a/linux-2.6-xen-sparse/net/core/dev.c Tue Jun 27 18:24:08 2006 +0100 +++ b/linux-2.6-xen-sparse/net/core/dev.c Wed Jun 28 13:44:18 2006 +1000 @@ -115,6 +115,7 @@ #include <net/iw_handler.h> #endif /* CONFIG_NET_RADIO */ #include <asm/current.h> +#include <linux/err.h> #ifdef CONFIG_XEN #include <net/ip.h> @@ -1038,7 +1039,7 @@ static inline void net_timestamp(struct * taps currently in use. */ -void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) +static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) { struct packet_type *ptype; @@ -1112,6 +1113,45 @@ out: return ret; } +/** + * skb_gso_segment - Perform segmentation on skb. + * @skb: buffer to segment + * @features: features for the output path (see dev->features) + * + * This function segments the given skb and returns a list of segments. + * + * It may return NULL if the skb requires no segmentation. This is + * only possible when GSO is used for verifying header integrity. + */ +struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features) +{ + struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); + struct packet_type *ptype; + int type = skb->protocol; + + BUG_ON(skb_shinfo(skb)->frag_list); + BUG_ON(skb->ip_summed != CHECKSUM_HW); + + skb->mac.raw = skb->data; + skb->mac_len = skb->nh.raw - skb->data; + __skb_pull(skb, skb->mac_len); + + rcu_read_lock(); + list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) { + if (ptype->type == type && !ptype->dev && ptype->gso_segment) { + segs = ptype->gso_segment(skb, features); + break; + } + } + rcu_read_unlock(); + + __skb_push(skb, skb->data - skb->mac.raw); + + return segs; +} + +EXPORT_SYMBOL(skb_gso_segment); + /* Take action when hardware reception checksum errors are detected. */ #ifdef CONFIG_BUG void netdev_rx_csum_fault(struct net_device *dev) @@ -1148,75 +1188,108 @@ static inline int illegal_highdma(struct #define illegal_highdma(dev, skb) (0) #endif -/* Keep head the same: replace data */ -int __skb_linearize(struct sk_buff *skb, gfp_t gfp_mask) -{ - unsigned int size; - u8 *data; - long offset; - struct skb_shared_info *ninfo; - int headerlen = skb->data - skb->head; - int expand = (skb->tail + skb->data_len) - skb->end; - - if (skb_shared(skb)) - BUG(); - - if (expand <= 0) - expand = 0; - - size = skb->end - skb->head + expand; - size = SKB_DATA_ALIGN(size); - data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); - if (!data) - return -ENOMEM; - - /* Copy entire thing */ - if (skb_copy_bits(skb, -headerlen, data, headerlen + skb->len)) - BUG(); - - /* Set up shinfo */ - ninfo = (struct skb_shared_info*)(data + size); - atomic_set(&ninfo->dataref, 1); - ninfo->tso_size = skb_shinfo(skb)->tso_size; - ninfo->tso_segs = skb_shinfo(skb)->tso_segs; - ninfo->nr_frags = 0; - ninfo->frag_list = NULL; - - /* Offset between the two in bytes */ - offset = data - skb->head; - - /* Free old data. */ - skb_release_data(skb); - - skb->head = data; - skb->end = data + size; - - /* Set up new pointers */ - skb->h.raw += offset; - skb->nh.raw += offset; - skb->mac.raw += offset; - skb->tail += offset; - skb->data += offset; - - /* We are no longer a clone, even if we were. */ - skb->cloned = 0; - - skb->tail += skb->data_len; - skb->data_len = 0; +struct dev_gso_cb { + void (*destructor)(struct sk_buff *skb); +}; + +#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb) + +static void dev_gso_skb_destructor(struct sk_buff *skb) +{ + struct dev_gso_cb *cb; + + do { + struct sk_buff *nskb = skb->next; + + skb->next = nskb->next; + nskb->next = NULL; + kfree_skb(nskb); + } while (skb->next); + + cb = DEV_GSO_CB(skb); + if (cb->destructor) + cb->destructor(skb); +} + +/** + * dev_gso_segment - Perform emulated hardware segmentation on skb. + * @skb: buffer to segment + * + * This function segments the given skb and stores the list of segments + * in skb->next. + */ +static int dev_gso_segment(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct sk_buff *segs; + int features = dev->features & ~(illegal_highdma(dev, skb) ? + NETIF_F_SG : 0); + + segs = skb_gso_segment(skb, features); + + /* Verifying header integrity only. */ + if (!segs) + return 0; + + if (unlikely(IS_ERR(segs))) + return PTR_ERR(segs); + + skb->next = segs; + DEV_GSO_CB(skb)->destructor = skb->destructor; + skb->destructor = dev_gso_skb_destructor; + + return 0; +} + +int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + if (likely(!skb->next)) { + if (netdev_nit) + dev_queue_xmit_nit(skb, dev); + + if (netif_needs_gso(dev, skb)) { + if (unlikely(dev_gso_segment(skb))) + goto out_kfree_skb; + if (skb->next) + goto gso; + } + + return dev->hard_start_xmit(skb, dev); + } + +gso: + do { + struct sk_buff *nskb = skb->next; + int rc; + + skb->next = nskb->next; + nskb->next = NULL; + rc = dev->hard_start_xmit(nskb, dev); + if (unlikely(rc)) { + nskb->next = skb->next; + skb->next = nskb; + return rc; + } + if (unlikely(netif_queue_stopped(dev) && skb->next)) + return NETDEV_TX_BUSY; + } while (skb->next); + + skb->destructor = DEV_GSO_CB(skb)->destructor; + +out_kfree_skb: + kfree_skb(skb); return 0; } #define HARD_TX_LOCK(dev, cpu) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ - spin_lock(&dev->xmit_lock); \ - dev->xmit_lock_owner = cpu; \ + netif_tx_lock(dev); \ } \ } #define HARD_TX_UNLOCK(dev) { \ if ((dev->features & NETIF_F_LLTX) == 0) { \ - dev->xmit_lock_owner = -1; \ - spin_unlock(&dev->xmit_lock); \ + netif_tx_unlock(dev); \ } \ } @@ -1289,9 +1362,19 @@ int dev_queue_xmit(struct sk_buff *skb) struct Qdisc *q; int rc = -ENOMEM; + /* If a checksum-deferred packet is forwarded to a device that needs a + * checksum, correct the pointers and force checksumming. + */ + if (skb_checksum_setup(skb)) + goto out_kfree_skb; + + /* GSO will handle the following emulations directly. */ + if (netif_needs_gso(dev, skb)) + goto gso; + if (skb_shinfo(skb)->frag_list && !(dev->features & NETIF_F_FRAGLIST) && - __skb_linearize(skb, GFP_ATOMIC)) + __skb_linearize(skb)) goto out_kfree_skb; /* Fragmented skb is linearized if device does not support SG, @@ -1300,31 +1383,26 @@ int dev_queue_xmit(struct sk_buff *skb) */ if (skb_shinfo(skb)->nr_frags && (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && - __skb_linearize(skb, GFP_ATOMIC)) + __skb_linearize(skb)) goto out_kfree_skb; - /* If a checksum-deferred packet is forwarded to a device that needs a - * checksum, correct the pointers and force checksumming. - */ - if(skb_checksum_setup(skb)) - goto out_kfree_skb; - /* If packet is not checksummed and device does not support * checksumming for this protocol, complete checksumming here. */ if (skb->ip_summed == CHECKSUM_HW && - (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) && + (!(dev->features & NETIF_F_GEN_CSUM) && (!(dev->features & NETIF_F_IP_CSUM) || skb->protocol != htons(ETH_P_IP)))) if (skb_checksum_help(skb, 0)) goto out_kfree_skb; +gso: spin_lock_prefetch(&dev->queue_lock); /* Disable soft irqs for various locks below. Also * stops preemption for RCU. */ - local_bh_disable(); + rcu_read_lock_bh(); /* Updates of qdisc are serialized by queue_lock. * The struct Qdisc which is pointed to by qdisc is now a @@ -1358,8 +1436,8 @@ int dev_queue_xmit(struct sk_buff *skb) /* The device has no queue. Common case for software devices: loopback, all the sorts of tunnels... - Really, it is unlikely that xmit_lock protection is necessary here. - (f.e. loopback and IP tunnels are clean ignoring statistics + Really, it is unlikely that netif_tx_lock protection is necessary + here. (f.e. loopback and IP tunnels are clean ignoring statistics counters.) However, it is possible, that they rely on protection made by us here. @@ -1375,11 +1453,8 @@ int dev_queue_xmit(struct sk_buff *skb) HARD_TX_LOCK(dev, cpu); if (!netif_queue_stopped(dev)) { - if (netdev_nit) - dev_queue_xmit_nit(skb, dev); - rc = 0; - if (!dev->hard_start_xmit(skb, dev)) { + if (!dev_hard_start_xmit(skb, dev)) { HARD_TX_UNLOCK(dev); goto out; } @@ -1398,13 +1473,13 @@ int dev_queue_xmit(struct sk_buff *skb) } rc = -ENETDOWN; - local_bh_enable(); + rcu_read_unlock_bh(); out_kfree_skb: kfree_skb(skb); return rc; out: - local_bh_enable(); + rcu_read_unlock_bh(); return rc; } @@ -2732,7 +2807,7 @@ int register_netdevice(struct net_device BUG_ON(dev->reg_state != NETREG_UNINITIALIZED); spin_lock_init(&dev->queue_lock); - spin_lock_init(&dev->xmit_lock); + spin_lock_init(&dev->_xmit_lock); dev->xmit_lock_owner = -1; #ifdef CONFIG_NET_CLS_ACT spin_lock_init(&dev->ingress_lock); @@ -2776,9 +2851,7 @@ int register_netdevice(struct net_device /* Fix illegal SG+CSUM combinations. */ if ((dev->features & NETIF_F_SG) && - !(dev->features & (NETIF_F_IP_CSUM | - NETIF_F_NO_CSUM | - NETIF_F_HW_CSUM))) { + !(dev->features & NETIF_F_ALL_CSUM)) { printk("%s: Dropping NETIF_F_SG since no checksum feature.\n", dev->name); dev->features &= ~NETIF_F_SG; @@ -3330,7 +3403,6 @@ EXPORT_SYMBOL(__dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_name); EXPORT_SYMBOL(__dev_remove_pack); -EXPORT_SYMBOL(__skb_linearize); EXPORT_SYMBOL(dev_valid_name); EXPORT_SYMBOL(dev_add_pack); EXPORT_SYMBOL(dev_alloc_name); diff -r 1da8f53ce65b -r 6913e0756b81 linux-2.6-xen-sparse/net/core/skbuff.c --- a/linux-2.6-xen-sparse/net/core/skbuff.c Tue Jun 27 18:24:08 2006 +0100 +++ b/linux-2.6-xen-sparse/net/core/skbuff.c Wed Jun 28 13:44:18 2006 +1000 @@ -165,9 +165,9 @@ struct sk_buff *__alloc_skb(unsigned int shinfo = skb_shinfo(skb); atomic_set(&shinfo->dataref, 1); shinfo->nr_frags = 0; - shinfo->tso_size = 0; - shinfo->tso_segs = 0; - shinfo->ufo_size = 0; + shinfo->gso_size = 0; + shinfo->gso_segs = 0; + shinfo->gso_type = 0; shinfo->ip6_frag_id = 0; shinfo->frag_list = NULL; @@ -237,9 +237,9 @@ struct sk_buff *alloc_skb_from_cache(kme shinfo = skb_shinfo(skb); atomic_set(&shinfo->dataref, 1); shinfo->nr_frags = 0; - shinfo->tso_size = 0; - shinfo->tso_segs = 0; - shinfo->ufo_size = 0; + shinfo->gso_size = 0; + shinfo->gso_segs = 0; + shinfo->gso_type = 0; shinfo->ip6_frag_id = 0; shinfo->frag_list = NULL; @@ -524,8 +524,9 @@ static void copy_skb_header(struct sk_bu new->tc_index = old->tc_index; #endif atomic_set(&new->users, 1); - skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size; - skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs; + skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size; + skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs; + skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type; } /** @@ -1799,6 +1800,133 @@ int skb_append_datato_frags(struct sock return 0; } + +/** + * skb_segment - Perform protocol segmentation on skb. + * @skb: buffer to segment + * @features: features for the output path (see dev->features) + * + * This function performs segmentation on the given skb. It returns + * the segment at the given position. It returns NULL if there are + * no more segments to generate, or when an error is encountered. + */ +struct sk_buff *skb_segment(struct sk_buff *skb, int features) +{ + struct sk_buff *segs = NULL; + struct sk_buff *tail = NULL; + unsigned int mss = skb_shinfo(skb)->gso_size; + unsigned int doffset = skb->data - skb->mac.raw; + unsigned int offset = doffset; + unsigned int headroom; + unsigned int len; + int sg = features & NETIF_F_SG; + int nfrags = skb_shinfo(skb)->nr_frags; + int err = -ENOMEM; + int i = 0; + int pos; + + __skb_push(skb, doffset); + headroom = skb_headroom(skb); + pos = skb_headlen(skb); + + do { + struct sk_buff *nskb; + skb_frag_t *frag; + int hsize, nsize; + int k; + int size; + + len = skb->len - offset; + if (len > mss) + len = mss; + + hsize = skb_headlen(skb) - offset; + if (hsize < 0) + hsize = 0; + nsize = hsize + doffset; + if (nsize > len + doffset || !sg) + nsize = len + doffset; + + nskb = alloc_skb(nsize + headroom, GFP_ATOMIC); + if (unlikely(!nskb)) + goto err; + + if (segs) + tail->next = nskb; + else + segs = nskb; + tail = nskb; + + nskb->dev = skb->dev; + nskb->priority = skb->priority; + nskb->protocol = skb->protocol; + nskb->dst = dst_clone(skb->dst); + memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); + nskb->pkt_type = skb->pkt_type; + nskb->mac_len = skb->mac_len; + + skb_reserve(nskb, headroom); + nskb->mac.raw = nskb->data; + nskb->nh.raw = nskb->data + skb->mac_len; + nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw); + memcpy(skb_put(nskb, doffset), skb->data, doffset); + + if (!sg) { + nskb->csum = skb_copy_and_csum_bits(skb, offset, + skb_put(nskb, len), + len, 0); + continue; + } + + frag = skb_shinfo(nskb)->frags; + k = 0; + + nskb->ip_summed = CHECKSUM_HW; + nskb->csum = skb->csum; + memcpy(skb_put(nskb, hsize), skb->data + offset, hsize); + + while (pos < offset + len) { + BUG_ON(i >= nfrags); + + *frag = skb_shinfo(skb)->frags[i]; + get_page(frag->page); + size = frag->size; + + if (pos < offset) { + frag->page_offset += offset - pos; + frag->size -= offset - pos; + } + + k++; + + if (pos + size <= offset + len) { + i++; + pos += size; + } else { + frag->size -= pos + size - (offset + len); + break; + } + + frag++; + } + + skb_shinfo(nskb)->nr_frags = k; + nskb->data_len = len - hsize; + nskb->len += nskb->data_len; + nskb->truesize += nskb->data_len; + } while ((offset += len) < skb->len); + + return segs; + +err: + while ((skb = segs)) { + segs = skb->next; + kfree(skb); + } + return ERR_PTR(err); +} + +EXPORT_SYMBOL_GPL(skb_segment); void __init skb_init(void) { diff -r 1da8f53ce65b -r 6913e0756b81 patches/linux-2.6.16.13/net-gso.patch --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/patches/linux-2.6.16.13/net-gso.patch Wed Jun 28 13:44:18 2006 +1000 @@ -0,0 +1,2907 @@ +diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt +index 3c0a5ba..847cedb 100644 +--- a/Documentation/networking/netdevices.txt ++++ b/Documentation/networking/netdevices.txt +@@ -42,9 +42,9 @@ dev->get_stats: + Context: nominally process, but don''t sleep inside an rwlock + + dev->hard_start_xmit: +- Synchronization: dev->xmit_lock spinlock. ++ Synchronization: netif_tx_lock spinlock. + When the driver sets NETIF_F_LLTX in dev->features this will be +- called without holding xmit_lock. In this case the driver ++ called without holding netif_tx_lock. In this case the driver + has to lock by itself when needed. It is recommended to use a try lock + for this and return -1 when the spin lock fails. + The locking there should also properly protect against +@@ -62,12 +62,12 @@ dev->hard_start_xmit: + Only valid when NETIF_F_LLTX is set. + + dev->tx_timeout: +- Synchronization: dev->xmit_lock spinlock. ++ Synchronization: netif_tx_lock spinlock. + Context: BHs disabled + Notes: netif_queue_stopped() is guaranteed true + + dev->set_multicast_list: +- Synchronization: dev->xmit_lock spinlock. ++ Synchronization: netif_tx_lock spinlock. + Context: BHs disabled + + dev->poll: +diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c +index 4be9769..2e7cac7 100644 +--- a/drivers/block/aoe/aoenet.c ++++ b/drivers/block/aoe/aoenet.c +@@ -95,9 +95,8 @@ mac_addr(char addr[6]) + static struct sk_buff * + skb_check(struct sk_buff *skb) + { +- if (skb_is_nonlinear(skb)) + if ((skb = skb_share_check(skb, GFP_ATOMIC))) +- if (skb_linearize(skb, GFP_ATOMIC) < 0) { ++ if (skb_linearize(skb)) { + dev_kfree_skb(skb); + return NULL; + } +diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +index a2408d7..c90e620 100644 +--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c ++++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +@@ -821,7 +821,8 @@ void ipoib_mcast_restart_task(void *dev_ + + ipoib_mcast_stop_thread(dev, 0); + +- spin_lock_irqsave(&dev->xmit_lock, flags); ++ local_irq_save(flags); ++ netif_tx_lock(dev); + spin_lock(&priv->lock); + + /* +@@ -896,7 +897,8 @@ void ipoib_mcast_restart_task(void *dev_ + } + + spin_unlock(&priv->lock); +- spin_unlock_irqrestore(&dev->xmit_lock, flags); ++ netif_tx_unlock(dev); ++ local_irq_restore(flags); + + /* We have to cancel outside of the spinlock */ + list_for_each_entry_safe(mcast, tmcast, &remove_list, list) { +diff --git a/drivers/media/dvb/dvb-core/dvb_net.c b/drivers/media/dvb/dvb-core/dvb_net.c +index 6711eb6..8d2351f 100644 +--- a/drivers/media/dvb/dvb-core/dvb_net.c ++++ b/drivers/media/dvb/dvb-core/dvb_net.c +@@ -1052,7 +1052,7 @@ static void wq_set_multicast_list (void + + dvb_net_feed_stop(dev); + priv->rx_mode = RX_MODE_UNI; +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + + if (dev->flags & IFF_PROMISC) { + dprintk("%s: promiscuous mode\n", dev->name); +@@ -1077,7 +1077,7 @@ static void wq_set_multicast_list (void + } + } + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + dvb_net_feed_start(dev); + } + +diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c +index dd41049..6615583 100644 +--- a/drivers/net/8139cp.c ++++ b/drivers/net/8139cp.c +@@ -794,7 +794,7 @@ #endif + entry = cp->tx_head; + eor = (entry == (CP_TX_RING_SIZE - 1)) ? RingEnd : 0; + if (dev->features & NETIF_F_TSO) +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + + if (skb_shinfo(skb)->nr_frags == 0) { + struct cp_desc *txd = &cp->tx_ring[entry]; +diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c +index a24200d..b5e39a1 100644 +--- a/drivers/net/bnx2.c ++++ b/drivers/net/bnx2.c +@@ -1593,7 +1593,7 @@ bnx2_tx_int(struct bnx2 *bp) + skb = tx_buf->skb; + #ifdef BCM_TSO + /* partial BD completions possible with TSO packets */ +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + u16 last_idx, last_ring_idx; + + last_idx = sw_cons + +@@ -1948,7 +1948,7 @@ bnx2_poll(struct net_device *dev, int *b + return 1; + } + +-/* Called with rtnl_lock from vlan functions and also dev->xmit_lock ++/* Called with rtnl_lock from vlan functions and also netif_tx_lock + * from set_multicast. + */ + static void +@@ -4403,7 +4403,7 @@ bnx2_vlan_rx_kill_vid(struct net_device + } + #endif + +-/* Called with dev->xmit_lock. ++/* Called with netif_tx_lock. + * hard_start_xmit is pseudo-lockless - a lock is only required when + * the tx queue is full. This way, we get the benefit of lockless + * operations most of the time without the complexities to handle +@@ -4441,7 +4441,7 @@ bnx2_start_xmit(struct sk_buff *skb, str + (TX_BD_FLAGS_VLAN_TAG | (vlan_tx_tag_get(skb) << 16)); + } + #ifdef BCM_TSO +- if ((mss = skb_shinfo(skb)->tso_size) && ++ if ((mss = skb_shinfo(skb)->gso_size) && + (skb->len > (bp->dev->mtu + ETH_HLEN))) { + u32 tcp_opt_len, ip_tcp_len; + +diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c +index bcf9f17..e970921 100644 +--- a/drivers/net/bonding/bond_main.c ++++ b/drivers/net/bonding/bond_main.c +@@ -1145,8 +1145,7 @@ int bond_sethwaddr(struct net_device *bo + } + + #define BOND_INTERSECT_FEATURES \ +- (NETIF_F_SG|NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM|\ +- NETIF_F_TSO|NETIF_F_UFO) ++ (NETIF_F_SG | NETIF_F_ALL_CSUM | NETIF_F_TSO | NETIF_F_UFO) + + /* + * Compute the common dev->feature set available to all slaves. Some +@@ -1164,9 +1163,7 @@ static int bond_compute_features(struct + features &= (slave->dev->features & BOND_INTERSECT_FEATURES); + + if ((features & NETIF_F_SG) && +- !(features & (NETIF_F_IP_CSUM | +- NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM))) ++ !(features & NETIF_F_ALL_CSUM)) + features &= ~NETIF_F_SG; + + /* +@@ -4147,7 +4144,7 @@ static int bond_init(struct net_device * + */ + bond_dev->features |= NETIF_F_VLAN_CHALLENGED; + +- /* don''t acquire bond device''s xmit_lock when ++ /* don''t acquire bond device''s netif_tx_lock when + * transmitting */ + bond_dev->features |= NETIF_F_LLTX; + +diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c +index 30ff8ea..7b7d360 100644 +--- a/drivers/net/chelsio/sge.c ++++ b/drivers/net/chelsio/sge.c +@@ -1419,7 +1419,7 @@ int t1_start_xmit(struct sk_buff *skb, s + struct cpl_tx_pkt *cpl; + + #ifdef NETIF_F_TSO +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + int eth_type; + struct cpl_tx_pkt_lso *hdr; + +@@ -1434,7 +1434,7 @@ #ifdef NETIF_F_TSO + hdr->ip_hdr_words = skb->nh.iph->ihl; + hdr->tcp_hdr_words = skb->h.th->doff; + hdr->eth_type_mss = htons(MK_ETH_TYPE_MSS(eth_type, +- skb_shinfo(skb)->tso_size)); ++ skb_shinfo(skb)->gso_size)); + hdr->len = htonl(skb->len - sizeof(*hdr)); + cpl = (struct cpl_tx_pkt *)hdr; + sge->stats.tx_lso_pkts++; +diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c +index fa29402..681d284 100644 +--- a/drivers/net/e1000/e1000_main.c ++++ b/drivers/net/e1000/e1000_main.c +@@ -2526,7 +2526,7 @@ #ifdef NETIF_F_TSO + uint8_t ipcss, ipcso, tucss, tucso, hdr_len; + int err; + +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + if (skb_header_cloned(skb)) { + err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + if (err) +@@ -2534,7 +2534,7 @@ #ifdef NETIF_F_TSO + } + + hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2)); +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + if (skb->protocol == ntohs(ETH_P_IP)) { + skb->nh.iph->tot_len = 0; + skb->nh.iph->check = 0; +@@ -2651,7 +2651,7 @@ #ifdef NETIF_F_TSO + * tso gets written back prematurely before the data is fully + * DMAd to the controller */ + if (!skb->data_len && tx_ring->last_tx_tso && +- !skb_shinfo(skb)->tso_size) { ++ !skb_shinfo(skb)->gso_size) { + tx_ring->last_tx_tso = 0; + size -= 4; + } +@@ -2893,7 +2893,7 @@ #endif + } + + #ifdef NETIF_F_TSO +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + /* The controller does a simple calculation to + * make sure there is enough room in the FIFO before + * initiating the DMA for each buffer. The calc is: +@@ -2935,7 +2935,7 @@ #endif + #ifdef NETIF_F_TSO + /* Controller Erratum workaround */ + if (!skb->data_len && tx_ring->last_tx_tso && +- !skb_shinfo(skb)->tso_size) ++ !skb_shinfo(skb)->gso_size) + count++; + #endif + +diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c +index 3682ec6..c35f16e 100644 +--- a/drivers/net/forcedeth.c ++++ b/drivers/net/forcedeth.c +@@ -482,9 +482,9 @@ #define LPA_1000HALF 0x0400 + * critical parts: + * - rx is (pseudo-) lockless: it relies on the single-threading provided + * by the arch code for interrupts. +- * - tx setup is lockless: it relies on dev->xmit_lock. Actual submission ++ * - tx setup is lockless: it relies on netif_tx_lock. Actual submission + * needs dev->priv->lock :-( +- * - set_multicast_list: preparation lockless, relies on dev->xmit_lock. ++ * - set_multicast_list: preparation lockless, relies on netif_tx_lock. + */ + + /* in dev: base, irq */ +@@ -1016,7 +1016,7 @@ static void drain_ring(struct net_device + + /* + * nv_start_xmit: dev->hard_start_xmit function +- * Called with dev->xmit_lock held. ++ * Called with netif_tx_lock held. + */ + static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev) + { +@@ -1105,8 +1105,8 @@ static int nv_start_xmit(struct sk_buff + np->tx_skbuff[nr] = skb; + + #ifdef NETIF_F_TSO +- if (skb_shinfo(skb)->tso_size) +- tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->tso_size << NV_TX2_TSO_SHIFT); ++ if (skb_shinfo(skb)->gso_size) ++ tx_flags_extra = NV_TX2_TSO | (skb_shinfo(skb)->gso_size << NV_TX2_TSO_SHIFT); + else + #endif + tx_flags_extra = (skb->ip_summed == CHECKSUM_HW ? (NV_TX2_CHECKSUM_L3|NV_TX2_CHECKSUM_L4) : 0); +@@ -1203,7 +1203,7 @@ static void nv_tx_done(struct net_device + + /* + * nv_tx_timeout: dev->tx_timeout function +- * Called with dev->xmit_lock held. ++ * Called with netif_tx_lock held. + */ + static void nv_tx_timeout(struct net_device *dev) + { +@@ -1524,7 +1524,7 @@ static int nv_change_mtu(struct net_devi + * Changing the MTU is a rare event, it shouldn''t matter. + */ + disable_irq(dev->irq); +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + spin_lock(&np->lock); + /* stop engines */ + nv_stop_rx(dev); +@@ -1559,7 +1559,7 @@ static int nv_change_mtu(struct net_devi + nv_start_rx(dev); + nv_start_tx(dev); + spin_unlock(&np->lock); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + enable_irq(dev->irq); + } + return 0; +@@ -1594,7 +1594,7 @@ static int nv_set_mac_address(struct net + memcpy(dev->dev_addr, macaddr->sa_data, ETH_ALEN); + + if (netif_running(dev)) { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + spin_lock_irq(&np->lock); + + /* stop rx engine */ +@@ -1606,7 +1606,7 @@ static int nv_set_mac_address(struct net + /* restart rx engine */ + nv_start_rx(dev); + spin_unlock_irq(&np->lock); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } else { + nv_copy_mac_to_hw(dev); + } +@@ -1615,7 +1615,7 @@ static int nv_set_mac_address(struct net + + /* + * nv_set_multicast: dev->set_multicast function +- * Called with dev->xmit_lock held. ++ * Called with netif_tx_lock held. + */ + static void nv_set_multicast(struct net_device *dev) + { +diff --git a/drivers/net/hamradio/6pack.c b/drivers/net/hamradio/6pack.c +index 102c1f0..d12605f 100644 +--- a/drivers/net/hamradio/6pack.c ++++ b/drivers/net/hamradio/6pack.c +@@ -308,9 +308,9 @@ static int sp_set_mac_address(struct net + { + struct sockaddr_ax25 *sa = addr; + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + return 0; + } +@@ -767,9 +767,9 @@ static int sixpack_ioctl(struct tty_stru + break; + } + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, &addr, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + err = 0; + break; +diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c +index dc5e9d5..5c66f5a 100644 +--- a/drivers/net/hamradio/mkiss.c ++++ b/drivers/net/hamradio/mkiss.c +@@ -357,9 +357,9 @@ static int ax_set_mac_address(struct net + { + struct sockaddr_ax25 *sa = addr; + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, &sa->sax25_call, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + return 0; + } +@@ -886,9 +886,9 @@ static int mkiss_ioctl(struct tty_struct + break; + } + +- spin_lock_irq(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + memcpy(dev->dev_addr, addr, AX25_ADDR_LEN); +- spin_unlock_irq(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + + err = 0; + break; +diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c +index 31fb2d7..2e222ef 100644 +--- a/drivers/net/ifb.c ++++ b/drivers/net/ifb.c +@@ -76,13 +76,13 @@ static void ri_tasklet(unsigned long dev + dp->st_task_enter++; + if ((skb = skb_peek(&dp->tq)) == NULL) { + dp->st_txq_refl_try++; +- if (spin_trylock(&_dev->xmit_lock)) { ++ if (netif_tx_trylock(_dev)) { + dp->st_rxq_enter++; + while ((skb = skb_dequeue(&dp->rq)) != NULL) { + skb_queue_tail(&dp->tq, skb); + dp->st_rx2tx_tran++; + } +- spin_unlock(&_dev->xmit_lock); ++ netif_tx_unlock(_dev); + } else { + /* reschedule */ + dp->st_rxq_notenter++; +@@ -110,7 +110,7 @@ static void ri_tasklet(unsigned long dev + } + } + +- if (spin_trylock(&_dev->xmit_lock)) { ++ if (netif_tx_trylock(_dev)) { + dp->st_rxq_check++; + if ((skb = skb_peek(&dp->rq)) == NULL) { + dp->tasklet_pending = 0; +@@ -118,10 +118,10 @@ static void ri_tasklet(unsigned long dev + netif_wake_queue(_dev); + } else { + dp->st_rxq_rsch++; +- spin_unlock(&_dev->xmit_lock); ++ netif_tx_unlock(_dev); + goto resched; + } +- spin_unlock(&_dev->xmit_lock); ++ netif_tx_unlock(_dev); + } else { + resched: + dp->tasklet_pending = 1; +diff --git a/drivers/net/irda/vlsi_ir.c b/drivers/net/irda/vlsi_ir.c +index a9f49f0..339d4a7 100644 +--- a/drivers/net/irda/vlsi_ir.c ++++ b/drivers/net/irda/vlsi_ir.c +@@ -959,7 +959,7 @@ static int vlsi_hard_start_xmit(struct s + || (now.tv_sec==ready.tv_sec && now.tv_usec>=ready.tv_usec)) + break; + udelay(100); +- /* must not sleep here - we are called under xmit_lock! */ ++ /* must not sleep here - called under netif_tx_lock! */ + } + } + +diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c +index f9f77e4..bdab369 100644 +--- a/drivers/net/ixgb/ixgb_main.c ++++ b/drivers/net/ixgb/ixgb_main.c +@@ -1163,7 +1163,7 @@ #ifdef NETIF_F_TSO + uint16_t ipcse, tucse, mss; + int err; + +- if(likely(skb_shinfo(skb)->tso_size)) { ++ if(likely(skb_shinfo(skb)->gso_size)) { + if (skb_header_cloned(skb)) { + err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + if (err) +@@ -1171,7 +1171,7 @@ #ifdef NETIF_F_TSO + } + + hdr_len = ((skb->h.raw - skb->data) + (skb->h.th->doff << 2)); +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + skb->nh.iph->tot_len = 0; + skb->nh.iph->check = 0; + skb->h.th->check = ~csum_tcpudp_magic(skb->nh.iph->saddr, +diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c +index 690a1aa..9bcaa80 100644 +--- a/drivers/net/loopback.c ++++ b/drivers/net/loopback.c +@@ -74,7 +74,7 @@ static void emulate_large_send_offload(s + struct iphdr *iph = skb->nh.iph; + struct tcphdr *th = (struct tcphdr*)(skb->nh.raw + (iph->ihl * 4)); + unsigned int doffset = (iph->ihl + th->doff) * 4; +- unsigned int mtu = skb_shinfo(skb)->tso_size + doffset; ++ unsigned int mtu = skb_shinfo(skb)->gso_size + doffset; + unsigned int offset = 0; + u32 seq = ntohl(th->seq); + u16 id = ntohs(iph->id); +@@ -139,7 +139,7 @@ #ifndef LOOPBACK_MUST_CHECKSUM + #endif + + #ifdef LOOPBACK_TSO +- if (skb_shinfo(skb)->tso_size) { ++ if (skb_shinfo(skb)->gso_size) { + BUG_ON(skb->protocol != htons(ETH_P_IP)); + BUG_ON(skb->nh.iph->protocol != IPPROTO_TCP); + +diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c +index c0998ef..0fac9d5 100644 +--- a/drivers/net/mv643xx_eth.c ++++ b/drivers/net/mv643xx_eth.c +@@ -1107,7 +1107,7 @@ static int mv643xx_eth_start_xmit(struct + + #ifdef MV643XX_CHECKSUM_OFFLOAD_TX + if (has_tiny_unaligned_frags(skb)) { +- if ((skb_linearize(skb, GFP_ATOMIC) != 0)) { ++ if (__skb_linearize(skb)) { + stats->tx_dropped++; + printk(KERN_DEBUG "%s: failed to linearize tiny " + "unaligned fragment\n", dev->name); +diff --git a/drivers/net/natsemi.c b/drivers/net/natsemi.c +index 9d6d254..c9ed624 100644 +--- a/drivers/net/natsemi.c ++++ b/drivers/net/natsemi.c +@@ -323,12 +323,12 @@ performance critical codepaths: + The rx process only runs in the interrupt handler. Access from outside + the interrupt handler is only permitted after disable_irq(). + +-The rx process usually runs under the dev->xmit_lock. If np->intr_tx_reap ++The rx process usually runs under the netif_tx_lock. If np->intr_tx_reap + is set, then access is permitted under spin_lock_irq(&np->lock). + + Thus configuration functions that want to access everything must call + disable_irq(dev->irq); +- spin_lock_bh(dev->xmit_lock); ++ netif_tx_lock_bh(dev); + spin_lock_irq(&np->lock); + + IV. Notes +diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c +index 8cc0d0b..e53b313 100644 +--- a/drivers/net/r8169.c ++++ b/drivers/net/r8169.c +@@ -2171,7 +2171,7 @@ static int rtl8169_xmit_frags(struct rtl + static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev) + { + if (dev->features & NETIF_F_TSO) { +- u32 mss = skb_shinfo(skb)->tso_size; ++ u32 mss = skb_shinfo(skb)->gso_size; + + if (mss) + return LargeSend | ((mss & MSSMask) << MSSShift); +diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c +index b7f00d6..439f45f 100644 +--- a/drivers/net/s2io.c ++++ b/drivers/net/s2io.c +@@ -3522,8 +3522,8 @@ #endif + txdp->Control_1 = 0; + txdp->Control_2 = 0; + #ifdef NETIF_F_TSO +- mss = skb_shinfo(skb)->tso_size; +- if (mss) { ++ mss = skb_shinfo(skb)->gso_size; ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_TCPV4) { + txdp->Control_1 |= TXD_TCP_LSO_EN; + txdp->Control_1 |= TXD_TCP_LSO_MSS(mss); + } +@@ -3543,10 +3543,10 @@ #endif + } + + frg_len = skb->len - skb->data_len; +- if (skb_shinfo(skb)->ufo_size) { ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) { + int ufo_size; + +- ufo_size = skb_shinfo(skb)->ufo_size; ++ ufo_size = skb_shinfo(skb)->gso_size; + ufo_size &= ~7; + txdp->Control_1 |= TXD_UFO_EN; + txdp->Control_1 |= TXD_UFO_MSS(ufo_size); +@@ -3572,7 +3572,7 @@ #endif + txdp->Host_Control = (unsigned long) skb; + txdp->Control_1 |= TXD_BUFFER0_SIZE(frg_len); + +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + txdp->Control_1 |= TXD_UFO_EN; + + frg_cnt = skb_shinfo(skb)->nr_frags; +@@ -3587,12 +3587,12 @@ #endif + (sp->pdev, frag->page, frag->page_offset, + frag->size, PCI_DMA_TODEVICE); + txdp->Control_1 = TXD_BUFFER0_SIZE(frag->size); +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + txdp->Control_1 |= TXD_UFO_EN; + } + txdp->Control_1 |= TXD_GATHER_CODE_LAST; + +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + frg_cnt++; /* as Txd0 was used for inband header */ + + tx_fifo = mac_control->tx_FIFO_start[queue]; +@@ -3606,7 +3606,7 @@ #ifdef NETIF_F_TSO + if (mss) + val64 |= TX_FIFO_SPECIAL_FUNC; + #endif +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_type == SKB_GSO_UDPV4) + val64 |= TX_FIFO_SPECIAL_FUNC; + writeq(val64, &tx_fifo->List_Control); + +diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c +index 0618cd5..2a55eb3 100644 +--- a/drivers/net/sky2.c ++++ b/drivers/net/sky2.c +@@ -1125,7 +1125,7 @@ static unsigned tx_le_req(const struct s + count = sizeof(dma_addr_t) / sizeof(u32); + count += skb_shinfo(skb)->nr_frags * count; + +- if (skb_shinfo(skb)->tso_size) ++ if (skb_shinfo(skb)->gso_size) + ++count; + + if (skb->ip_summed == CHECKSUM_HW) +@@ -1197,7 +1197,7 @@ static int sky2_xmit_frame(struct sk_buf + } + + /* Check for TCP Segmentation Offload */ +- mss = skb_shinfo(skb)->tso_size; ++ mss = skb_shinfo(skb)->gso_size; + if (mss != 0) { + /* just drop the packet if non-linear expansion fails */ + if (skb_header_cloned(skb) && +diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c +index caf4102..fc9164a 100644 +--- a/drivers/net/tg3.c ++++ b/drivers/net/tg3.c +@@ -3664,7 +3664,7 @@ static int tg3_start_xmit(struct sk_buff + #if TG3_TSO_SUPPORT != 0 + mss = 0; + if (skb->len > (tp->dev->mtu + ETH_HLEN) && +- (mss = skb_shinfo(skb)->tso_size) != 0) { ++ (mss = skb_shinfo(skb)->gso_size) != 0) { + int tcp_opt_len, ip_tcp_len; + + if (skb_header_cloned(skb) && +diff --git a/drivers/net/tulip/winbond-840.c b/drivers/net/tulip/winbond-840.c +index 5b1af39..11de5af 100644 +--- a/drivers/net/tulip/winbond-840.c ++++ b/drivers/net/tulip/winbond-840.c +@@ -1605,11 +1605,11 @@ #ifdef CONFIG_PM + * - get_stats: + * spin_lock_irq(np->lock), doesn''t touch hw if not present + * - hard_start_xmit: +- * netif_stop_queue + spin_unlock_wait(&dev->xmit_lock); ++ * synchronize_irq + netif_tx_disable; + * - tx_timeout: +- * netif_device_detach + spin_unlock_wait(&dev->xmit_lock); ++ * netif_device_detach + netif_tx_disable; + * - set_multicast_list +- * netif_device_detach + spin_unlock_wait(&dev->xmit_lock); ++ * netif_device_detach + netif_tx_disable; + * - interrupt handler + * doesn''t touch hw if not present, synchronize_irq waits for + * running instances of the interrupt handler. +@@ -1635,11 +1635,10 @@ static int w840_suspend (struct pci_dev + netif_device_detach(dev); + update_csr6(dev, 0); + iowrite32(0, ioaddr + IntrEnable); +- netif_stop_queue(dev); + spin_unlock_irq(&np->lock); + +- spin_unlock_wait(&dev->xmit_lock); + synchronize_irq(dev->irq); ++ netif_tx_disable(dev); + + np->stats.rx_missed_errors += ioread32(ioaddr + RxMissed) & 0xffff; + +diff --git a/drivers/net/typhoon.c b/drivers/net/typhoon.c +index 4c76cb7..30c48c9 100644 +--- a/drivers/net/typhoon.c ++++ b/drivers/net/typhoon.c +@@ -340,7 +340,7 @@ #define typhoon_synchronize_irq(x) synch + #endif + + #if defined(NETIF_F_TSO) +-#define skb_tso_size(x) (skb_shinfo(x)->tso_size) ++#define skb_tso_size(x) (skb_shinfo(x)->gso_size) + #define TSO_NUM_DESCRIPTORS 2 + #define TSO_OFFLOAD_ON TYPHOON_OFFLOAD_TCP_SEGMENT + #else +diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c +index ed1f837..2eb6b5f 100644 +--- a/drivers/net/via-velocity.c ++++ b/drivers/net/via-velocity.c +@@ -1899,6 +1899,13 @@ static int velocity_xmit(struct sk_buff + + int pktlen = skb->len; + ++#ifdef VELOCITY_ZERO_COPY_SUPPORT ++ if (skb_shinfo(skb)->nr_frags > 6 && __skb_linearize(skb)) { ++ kfree_skb(skb); ++ return 0; ++ } ++#endif ++ + spin_lock_irqsave(&vptr->lock, flags); + + index = vptr->td_curr[qnum]; +@@ -1914,8 +1921,6 @@ static int velocity_xmit(struct sk_buff + */ + if (pktlen < ETH_ZLEN) { + /* Cannot occur until ZC support */ +- if(skb_linearize(skb, GFP_ATOMIC)) +- return 0; + pktlen = ETH_ZLEN; + memcpy(tdinfo->buf, skb->data, skb->len); + memset(tdinfo->buf + skb->len, 0, ETH_ZLEN - skb->len); +@@ -1933,7 +1938,6 @@ #ifdef VELOCITY_ZERO_COPY_SUPPORT + int nfrags = skb_shinfo(skb)->nr_frags; + tdinfo->skb = skb; + if (nfrags > 6) { +- skb_linearize(skb, GFP_ATOMIC); + memcpy(tdinfo->buf, skb->data, skb->len); + tdinfo->skb_dma[0] = tdinfo->buf_dma; + td_ptr->tdesc0.pktsize = +diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c +index 6fd0bf7..75237c1 100644 +--- a/drivers/net/wireless/orinoco.c ++++ b/drivers/net/wireless/orinoco.c +@@ -1835,7 +1835,9 @@ static int __orinoco_program_rids(struct + /* Set promiscuity / multicast*/ + priv->promiscuous = 0; + priv->mc_count = 0; +- __orinoco_set_multicast_list(dev); /* FIXME: what about the xmit_lock */ ++ ++ /* FIXME: what about netif_tx_lock */ ++ __orinoco_set_multicast_list(dev); + + return 0; + } +diff --git a/drivers/s390/net/qeth_eddp.c b/drivers/s390/net/qeth_eddp.c +index 82cb4af..57cec40 100644 +--- a/drivers/s390/net/qeth_eddp.c ++++ b/drivers/s390/net/qeth_eddp.c +@@ -421,7 +421,7 @@ #endif /* CONFIG_QETH_VLAN */ + } + tcph = eddp->skb->h.th; + while (eddp->skb_offset < eddp->skb->len) { +- data_len = min((int)skb_shinfo(eddp->skb)->tso_size, ++ data_len = min((int)skb_shinfo(eddp->skb)->gso_size, + (int)(eddp->skb->len - eddp->skb_offset)); + /* prepare qdio hdr */ + if (eddp->qh.hdr.l2.id == QETH_HEADER_TYPE_LAYER2){ +@@ -516,20 +516,20 @@ qeth_eddp_calc_num_pages(struct qeth_edd + + QETH_DBF_TEXT(trace, 5, "eddpcanp"); + /* can we put multiple skbs in one page? */ +- skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->tso_size + hdr_len); ++ skbs_per_page = PAGE_SIZE / (skb_shinfo(skb)->gso_size + hdr_len); + if (skbs_per_page > 1){ +- ctx->num_pages = (skb_shinfo(skb)->tso_segs + 1) / ++ ctx->num_pages = (skb_shinfo(skb)->gso_segs + 1) / + skbs_per_page + 1; + ctx->elements_per_skb = 1; + } else { + /* no -> how many elements per skb? */ +- ctx->elements_per_skb = (skb_shinfo(skb)->tso_size + hdr_len + ++ ctx->elements_per_skb = (skb_shinfo(skb)->gso_size + hdr_len + + PAGE_SIZE) >> PAGE_SHIFT; + ctx->num_pages = ctx->elements_per_skb * +- (skb_shinfo(skb)->tso_segs + 1); ++ (skb_shinfo(skb)->gso_segs + 1); + } + ctx->num_elements = ctx->elements_per_skb * +- (skb_shinfo(skb)->tso_segs + 1); ++ (skb_shinfo(skb)->gso_segs + 1); + } + + static inline struct qeth_eddp_context * +diff --git a/drivers/s390/net/qeth_main.c b/drivers/s390/net/qeth_main.c +index dba7f7f..d9cc997 100644 +--- a/drivers/s390/net/qeth_main.c ++++ b/drivers/s390/net/qeth_main.c +@@ -4454,7 +4454,7 @@ qeth_send_packet(struct qeth_card *card, + queue = card->qdio.out_qs + [qeth_get_priority_queue(card, skb, ipv, cast_type)]; + +- if (skb_shinfo(skb)->tso_size) ++ if (skb_shinfo(skb)->gso_size) + large_send = card->options.large_send; + + /*are we able to do TSO ? If so ,prepare and send it from here */ +@@ -4501,7 +4501,7 @@ qeth_send_packet(struct qeth_card *card, + card->stats.tx_packets++; + card->stats.tx_bytes += skb->len; + #ifdef CONFIG_QETH_PERF_STATS +- if (skb_shinfo(skb)->tso_size && ++ if (skb_shinfo(skb)->gso_size && + !(large_send == QETH_LARGE_SEND_NO)) { + card->perf_stats.large_send_bytes += skb->len; + card->perf_stats.large_send_cnt++; +diff --git a/drivers/s390/net/qeth_tso.h b/drivers/s390/net/qeth_tso.h +index 1286dde..89cbf34 100644 +--- a/drivers/s390/net/qeth_tso.h ++++ b/drivers/s390/net/qeth_tso.h +@@ -51,7 +51,7 @@ qeth_tso_fill_header(struct qeth_card *c + hdr->ext.hdr_version = 1; + hdr->ext.hdr_len = 28; + /*insert non-fix values */ +- hdr->ext.mss = skb_shinfo(skb)->tso_size; ++ hdr->ext.mss = skb_shinfo(skb)->gso_size; + hdr->ext.dg_hdr_len = (__u16)(iph->ihl*4 + tcph->doff*4); + hdr->ext.payload_len = (__u16)(skb->len - hdr->ext.dg_hdr_len - + sizeof(struct qeth_hdr_tso)); +diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h +index 93535f0..9269df7 100644 +--- a/include/linux/ethtool.h ++++ b/include/linux/ethtool.h +@@ -408,6 +408,8 @@ #define ETHTOOL_STSO 0x0000001f /* Set + #define ETHTOOL_GPERMADDR 0x00000020 /* Get permanent hardware address */ + #define ETHTOOL_GUFO 0x00000021 /* Get UFO enable (ethtool_value) */ + #define ETHTOOL_SUFO 0x00000022 /* Set UFO enable (ethtool_value) */ ++#define ETHTOOL_GGSO 0x00000023 /* Get GSO enable (ethtool_value) */ ++#define ETHTOOL_SGSO 0x00000024 /* Set GSO enable (ethtool_value) */ + + /* compatibility with older code */ + #define SPARC_ETH_GSET ETHTOOL_GSET +diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h +index 7fda03d..47b0965 100644 +--- a/include/linux/netdevice.h ++++ b/include/linux/netdevice.h +@@ -230,7 +230,8 @@ enum netdev_state_t + __LINK_STATE_SCHED, + __LINK_STATE_NOCARRIER, + __LINK_STATE_RX_SCHED, +- __LINK_STATE_LINKWATCH_PENDING ++ __LINK_STATE_LINKWATCH_PENDING, ++ __LINK_STATE_QDISC_RUNNING, + }; + + +@@ -306,9 +307,17 @@ #define NETIF_F_HW_VLAN_TX 128 /* Transm + #define NETIF_F_HW_VLAN_RX 256 /* Receive VLAN hw acceleration */ + #define NETIF_F_HW_VLAN_FILTER 512 /* Receive filtering on VLAN */ + #define NETIF_F_VLAN_CHALLENGED 1024 /* Device cannot handle VLAN packets */ +-#define NETIF_F_TSO 2048 /* Can offload TCP/IP segmentation */ ++#define NETIF_F_GSO 2048 /* Enable software GSO. */ + #define NETIF_F_LLTX 4096 /* LockLess TX */ +-#define NETIF_F_UFO 8192 /* Can offload UDP Large Send*/ ++ ++ /* Segmentation offload features */ ++#define NETIF_F_GSO_SHIFT 16 ++#define NETIF_F_TSO (SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT) ++#define NETIF_F_UFO (SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT) ++#define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY << NETIF_F_GSO_SHIFT) ++ ++#define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) ++#define NETIF_F_ALL_CSUM (NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM) + + struct net_device *next_sched; + +@@ -394,6 +403,9 @@ #define NETIF_F_UFO 8192 + struct list_head qdisc_list; + unsigned long tx_queue_len; /* Max frames per queue allowed */ + ++ /* Partially transmitted GSO packet. */ ++ struct sk_buff *gso_skb; ++ + /* ingress path synchronizer */ + spinlock_t ingress_lock; + struct Qdisc *qdisc_ingress; +@@ -402,7 +414,7 @@ #define NETIF_F_UFO 8192 + * One part is mostly used on xmit path (device) + */ + /* hard_start_xmit synchronizer */ +- spinlock_t xmit_lock ____cacheline_aligned_in_smp; ++ spinlock_t _xmit_lock ____cacheline_aligned_in_smp; + /* cpu id of processor entered to hard_start_xmit or -1, + if nobody entered there. + */ +@@ -527,6 +539,8 @@ struct packet_type { + struct net_device *, + struct packet_type *, + struct net_device *); ++ struct sk_buff *(*gso_segment)(struct sk_buff *skb, ++ int features); + void *af_packet_priv; + struct list_head list; + }; +@@ -693,7 +707,8 @@ extern int dev_change_name(struct net_d + extern int dev_set_mtu(struct net_device *, int); + extern int dev_set_mac_address(struct net_device *, + struct sockaddr *); +-extern void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); ++extern int dev_hard_start_xmit(struct sk_buff *skb, ++ struct net_device *dev); + + extern void dev_init(void); + +@@ -900,11 +915,43 @@ static inline void __netif_rx_complete(s + clear_bit(__LINK_STATE_RX_SCHED, &dev->state); + } + ++static inline void netif_tx_lock(struct net_device *dev) ++{ ++ spin_lock(&dev->_xmit_lock); ++ dev->xmit_lock_owner = smp_processor_id(); ++} ++ ++static inline void netif_tx_lock_bh(struct net_device *dev) ++{ ++ spin_lock_bh(&dev->_xmit_lock); ++ dev->xmit_lock_owner = smp_processor_id(); ++} ++ ++static inline int netif_tx_trylock(struct net_device *dev) ++{ ++ int err = spin_trylock(&dev->_xmit_lock); ++ if (!err) ++ dev->xmit_lock_owner = smp_processor_id(); ++ return err; ++} ++ ++static inline void netif_tx_unlock(struct net_device *dev) ++{ ++ dev->xmit_lock_owner = -1; ++ spin_unlock(&dev->_xmit_lock); ++} ++ ++static inline void netif_tx_unlock_bh(struct net_device *dev) ++{ ++ dev->xmit_lock_owner = -1; ++ spin_unlock_bh(&dev->_xmit_lock); ++} ++ + static inline void netif_tx_disable(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + netif_stop_queue(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ +@@ -932,6 +979,7 @@ extern int netdev_max_backlog; + extern int weight_p; + extern int netdev_set_master(struct net_device *dev, struct net_device *master); + extern int skb_checksum_help(struct sk_buff *skb, int inward); ++extern struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features); + #ifdef CONFIG_BUG + extern void netdev_rx_csum_fault(struct net_device *dev); + #else +@@ -951,6 +999,18 @@ #endif + + extern void linkwatch_run_queue(void); + ++static inline int skb_gso_ok(struct sk_buff *skb, int features) ++{ ++ int feature = skb_shinfo(skb)->gso_size ? ++ skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT : 0; ++ return (features & feature) == feature; ++} ++ ++static inline int netif_needs_gso(struct net_device *dev, struct sk_buff *skb) ++{ ++ return !skb_gso_ok(skb, dev->features); ++} ++ + #endif /* __KERNEL__ */ + + #endif /* _LINUX_DEV_H */ +diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h +index ad7cc22..b19d45d 100644 +--- a/include/linux/skbuff.h ++++ b/include/linux/skbuff.h +@@ -134,9 +134,10 @@ struct skb_frag_struct { + struct skb_shared_info { + atomic_t dataref; + unsigned short nr_frags; +- unsigned short tso_size; +- unsigned short tso_segs; +- unsigned short ufo_size; ++ unsigned short gso_size; ++ /* Warning: this field is not always filled in (UFO)! */ ++ unsigned short gso_segs; ++ unsigned short gso_type; + unsigned int ip6_frag_id; + struct sk_buff *frag_list; + skb_frag_t frags[MAX_SKB_FRAGS]; +@@ -168,6 +169,14 @@ enum { + SKB_FCLONE_CLONE, + }; + ++enum { ++ SKB_GSO_TCPV4 = 1 << 0, ++ SKB_GSO_UDPV4 = 1 << 1, ++ ++ /* This indicates the skb is from an untrusted source. */ ++ SKB_GSO_DODGY = 1 << 2, ++}; ++ + /** + * struct sk_buff - socket buffer + * @next: Next buffer in list +@@ -1148,18 +1157,34 @@ static inline int skb_can_coalesce(struc + return 0; + } + ++static inline int __skb_linearize(struct sk_buff *skb) ++{ ++ return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM; ++} ++ + /** + * skb_linearize - convert paged skb to linear one + * @skb: buffer to linarize +- * @gfp: allocation mode + * + * If there is no free memory -ENOMEM is returned, otherwise zero + * is returned and the old skb data released. + */ +-extern int __skb_linearize(struct sk_buff *skb, gfp_t gfp); +-static inline int skb_linearize(struct sk_buff *skb, gfp_t gfp) ++static inline int skb_linearize(struct sk_buff *skb) ++{ ++ return skb_is_nonlinear(skb) ? __skb_linearize(skb) : 0; ++} ++ ++/** ++ * skb_linearize_cow - make sure skb is linear and writable ++ * @skb: buffer to process ++ * ++ * If there is no free memory -ENOMEM is returned, otherwise zero ++ * is returned and the old skb data released. ++ */ ++static inline int skb_linearize_cow(struct sk_buff *skb) + { +- return __skb_linearize(skb, gfp); ++ return skb_is_nonlinear(skb) || skb_cloned(skb) ? ++ __skb_linearize(skb) : 0; + } + + /** +@@ -1254,6 +1279,7 @@ extern void skb_split(struct sk_b + struct sk_buff *skb1, const u32 len); + + extern void skb_release_data(struct sk_buff *skb); ++extern struct sk_buff *skb_segment(struct sk_buff *skb, int features); + + static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, + int len, void *buffer) +diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h +index b94d1ad..75b5b93 100644 +--- a/include/net/pkt_sched.h ++++ b/include/net/pkt_sched.h +@@ -218,12 +218,13 @@ extern struct qdisc_rate_table *qdisc_ge + struct rtattr *tab); + extern void qdisc_put_rtab(struct qdisc_rate_table *tab); + +-extern int qdisc_restart(struct net_device *dev); ++extern void __qdisc_run(struct net_device *dev); + + static inline void qdisc_run(struct net_device *dev) + { +- while (!netif_queue_stopped(dev) && qdisc_restart(dev) < 0) +- /* NOTHING */; ++ if (!netif_queue_stopped(dev) && ++ !test_and_set_bit(__LINK_STATE_QDISC_RUNNING, &dev->state)) ++ __qdisc_run(dev); + } + + extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp, +diff --git a/include/net/protocol.h b/include/net/protocol.h +index 6dc5970..0d2dcdb 100644 +--- a/include/net/protocol.h ++++ b/include/net/protocol.h +@@ -37,6 +37,8 @@ #define MAX_INET_PROTOS 256 /* Must be + struct net_protocol { + int (*handler)(struct sk_buff *skb); + void (*err_handler)(struct sk_buff *skb, u32 info); ++ struct sk_buff *(*gso_segment)(struct sk_buff *skb, ++ int features); + int no_policy; + }; + +diff --git a/include/net/sock.h b/include/net/sock.h +index f63d0d5..a8e8d21 100644 +--- a/include/net/sock.h ++++ b/include/net/sock.h +@@ -1064,9 +1064,13 @@ static inline void sk_setup_caps(struct + { + __sk_dst_set(sk, dst); + sk->sk_route_caps = dst->dev->features; ++ if (sk->sk_route_caps & NETIF_F_GSO) ++ sk->sk_route_caps |= NETIF_F_TSO; + if (sk->sk_route_caps & NETIF_F_TSO) { + if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len) + sk->sk_route_caps &= ~NETIF_F_TSO; ++ else ++ sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; + } + } + +diff --git a/include/net/tcp.h b/include/net/tcp.h +index 77f21c6..70e1d5f 100644 +--- a/include/net/tcp.h ++++ b/include/net/tcp.h +@@ -552,13 +552,13 @@ #include <net/tcp_ecn.h> + */ + static inline int tcp_skb_pcount(const struct sk_buff *skb) + { +- return skb_shinfo(skb)->tso_segs; ++ return skb_shinfo(skb)->gso_segs; + } + + /* This is valid iff tcp_skb_pcount() > 1. */ + static inline int tcp_skb_mss(const struct sk_buff *skb) + { +- return skb_shinfo(skb)->tso_size; ++ return skb_shinfo(skb)->gso_size; + } + + static inline void tcp_dec_pcount_approx(__u32 *count, +@@ -1063,6 +1063,8 @@ extern struct request_sock_ops tcp_reque + + extern int tcp_v4_destroy_sock(struct sock *sk); + ++extern struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features); ++ + #ifdef CONFIG_PROC_FS + extern int tcp4_proc_init(void); + extern void tcp4_proc_exit(void); +diff --git a/net/atm/clip.c b/net/atm/clip.c +index 1842a4e..6dc21a7 100644 +--- a/net/atm/clip.c ++++ b/net/atm/clip.c +@@ -101,7 +101,7 @@ static void unlink_clip_vcc(struct clip_ + printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc); + return; + } +- spin_lock_bh(&entry->neigh->dev->xmit_lock); /* block clip_start_xmit() */ ++ netif_tx_lock_bh(entry->neigh->dev); /* block clip_start_xmit() */ + entry->neigh->used = jiffies; + for (walk = &entry->vccs; *walk; walk = &(*walk)->next) + if (*walk == clip_vcc) { +@@ -125,7 +125,7 @@ static void unlink_clip_vcc(struct clip_ + printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc " + "0x%p)\n",entry,clip_vcc); + out: +- spin_unlock_bh(&entry->neigh->dev->xmit_lock); ++ netif_tx_unlock_bh(entry->neigh->dev); + } + + /* The neighbour entry n->lock is held. */ +diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c +index 0b33a7b..180e79b 100644 +--- a/net/bridge/br_device.c ++++ b/net/bridge/br_device.c +@@ -146,9 +146,9 @@ static int br_set_tx_csum(struct net_dev + struct net_bridge *br = netdev_priv(dev); + + if (data) +- br->feature_mask |= NETIF_F_IP_CSUM; ++ br->feature_mask |= NETIF_F_NO_CSUM; + else +- br->feature_mask &= ~NETIF_F_IP_CSUM; ++ br->feature_mask &= ~NETIF_F_ALL_CSUM; + + br_features_recompute(br); + return 0; +@@ -185,6 +185,6 @@ void br_dev_setup(struct net_device *dev + dev->set_mac_address = br_set_mac_address; + dev->priv_flags = IFF_EBRIDGE; + +- dev->features = NETIF_F_SG | NETIF_F_FRAGLIST +- | NETIF_F_HIGHDMA | NETIF_F_TSO | NETIF_F_IP_CSUM; ++ dev->features = NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HIGHDMA | ++ NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_GSO_ROBUST; + } +diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c +index 2d24fb4..00b1128 100644 +--- a/net/bridge/br_forward.c ++++ b/net/bridge/br_forward.c +@@ -32,7 +32,7 @@ static inline int should_deliver(const s + int br_dev_queue_push_xmit(struct sk_buff *skb) + { + /* drop mtu oversized packets except tso */ +- if (skb->len > skb->dev->mtu && !skb_shinfo(skb)->tso_size) ++ if (skb->len > skb->dev->mtu && !skb_shinfo(skb)->gso_size) + kfree_skb(skb); + else { + #ifdef CONFIG_BRIDGE_NETFILTER +diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c +index f36b35e..0617146 100644 +--- a/net/bridge/br_if.c ++++ b/net/bridge/br_if.c +@@ -385,17 +385,28 @@ void br_features_recompute(struct net_br + struct net_bridge_port *p; + unsigned long features, checksum; + +- features = br->feature_mask &~ NETIF_F_IP_CSUM; +- checksum = br->feature_mask & NETIF_F_IP_CSUM; ++ checksum = br->feature_mask & NETIF_F_ALL_CSUM ? NETIF_F_NO_CSUM : 0; ++ features = br->feature_mask & ~NETIF_F_ALL_CSUM; + + list_for_each_entry(p, &br->port_list, list) { +- if (!(p->dev->features +- & (NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM))) ++ unsigned long feature = p->dev->features; ++ ++ if (checksum & NETIF_F_NO_CSUM && !(feature & NETIF_F_NO_CSUM)) ++ checksum ^= NETIF_F_NO_CSUM | NETIF_F_HW_CSUM; ++ if (checksum & NETIF_F_HW_CSUM && !(feature & NETIF_F_HW_CSUM)) ++ checksum ^= NETIF_F_HW_CSUM | NETIF_F_IP_CSUM; ++ if (!(feature & NETIF_F_IP_CSUM)) + checksum = 0; +- features &= p->dev->features; ++ ++ if (feature & NETIF_F_GSO) ++ feature |= NETIF_F_TSO; ++ feature |= NETIF_F_GSO; ++ ++ features &= feature; + } + +- br->dev->features = features | checksum | NETIF_F_LLTX; ++ br->dev->features = features | checksum | NETIF_F_LLTX | ++ NETIF_F_GSO_ROBUST; + } + + /* called with RTNL */ +diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c +index 9e27373..588207f 100644 +--- a/net/bridge/br_netfilter.c ++++ b/net/bridge/br_netfilter.c +@@ -743,7 +743,7 @@ static int br_nf_dev_queue_xmit(struct s + { + if (skb->protocol == htons(ETH_P_IP) && + skb->len > skb->dev->mtu && +- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size)) ++ !skb_shinfo(skb)->gso_size) + return ip_fragment(skb, br_dev_queue_push_xmit); + else + return br_dev_queue_push_xmit(skb); +diff --git a/net/core/dev.c b/net/core/dev.c +index 12a214c..32e1056 100644 +--- a/net/core/dev.c ++++ b/net/core/dev.c +@@ -115,6 +115,7 @@ #include <linux/wireless.h> /* Note : w + #include <net/iw_handler.h> + #endif /* CONFIG_NET_RADIO */ + #include <asm/current.h> ++#include <linux/err.h> + + /* + * The list of packet types we will receive (as opposed to discard) +@@ -1032,7 +1033,7 @@ static inline void net_timestamp(struct + * taps currently in use. + */ + +-void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) ++static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev) + { + struct packet_type *ptype; + +@@ -1106,6 +1107,45 @@ out: + return ret; + } + ++/** ++ * skb_gso_segment - Perform segmentation on skb. ++ * @skb: buffer to segment ++ * @features: features for the output path (see dev->features) ++ * ++ * This function segments the given skb and returns a list of segments. ++ * ++ * It may return NULL if the skb requires no segmentation. This is ++ * only possible when GSO is used for verifying header integrity. ++ */ ++struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features) ++{ ++ struct sk_buff *segs = ERR_PTR(-EPROTONOSUPPORT); ++ struct packet_type *ptype; ++ int type = skb->protocol; ++ ++ BUG_ON(skb_shinfo(skb)->frag_list); ++ BUG_ON(skb->ip_summed != CHECKSUM_HW); ++ ++ skb->mac.raw = skb->data; ++ skb->mac_len = skb->nh.raw - skb->data; ++ __skb_pull(skb, skb->mac_len); ++ ++ rcu_read_lock(); ++ list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & 15], list) { ++ if (ptype->type == type && !ptype->dev && ptype->gso_segment) { ++ segs = ptype->gso_segment(skb, features); ++ break; ++ } ++ } ++ rcu_read_unlock(); ++ ++ __skb_push(skb, skb->data - skb->mac.raw); ++ ++ return segs; ++} ++ ++EXPORT_SYMBOL(skb_gso_segment); ++ + /* Take action when hardware reception checksum errors are detected. */ + #ifdef CONFIG_BUG + void netdev_rx_csum_fault(struct net_device *dev) +@@ -1142,75 +1182,108 @@ #else + #define illegal_highdma(dev, skb) (0) + #endif + +-/* Keep head the same: replace data */ +-int __skb_linearize(struct sk_buff *skb, gfp_t gfp_mask) +-{ +- unsigned int size; +- u8 *data; +- long offset; +- struct skb_shared_info *ninfo; +- int headerlen = skb->data - skb->head; +- int expand = (skb->tail + skb->data_len) - skb->end; +- +- if (skb_shared(skb)) +- BUG(); +- +- if (expand <= 0) +- expand = 0; +- +- size = skb->end - skb->head + expand; +- size = SKB_DATA_ALIGN(size); +- data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask); +- if (!data) +- return -ENOMEM; +- +- /* Copy entire thing */ +- if (skb_copy_bits(skb, -headerlen, data, headerlen + skb->len)) +- BUG(); +- +- /* Set up shinfo */ +- ninfo = (struct skb_shared_info*)(data + size); +- atomic_set(&ninfo->dataref, 1); +- ninfo->tso_size = skb_shinfo(skb)->tso_size; +- ninfo->tso_segs = skb_shinfo(skb)->tso_segs; +- ninfo->nr_frags = 0; +- ninfo->frag_list = NULL; +- +- /* Offset between the two in bytes */ +- offset = data - skb->head; +- +- /* Free old data. */ +- skb_release_data(skb); +- +- skb->head = data; +- skb->end = data + size; +- +- /* Set up new pointers */ +- skb->h.raw += offset; +- skb->nh.raw += offset; +- skb->mac.raw += offset; +- skb->tail += offset; +- skb->data += offset; +- +- /* We are no longer a clone, even if we were. */ +- skb->cloned = 0; +- +- skb->tail += skb->data_len; +- skb->data_len = 0; ++struct dev_gso_cb { ++ void (*destructor)(struct sk_buff *skb); ++}; ++ ++#define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb) ++ ++static void dev_gso_skb_destructor(struct sk_buff *skb) ++{ ++ struct dev_gso_cb *cb; ++ ++ do { ++ struct sk_buff *nskb = skb->next; ++ ++ skb->next = nskb->next; ++ nskb->next = NULL; ++ kfree_skb(nskb); ++ } while (skb->next); ++ ++ cb = DEV_GSO_CB(skb); ++ if (cb->destructor) ++ cb->destructor(skb); ++} ++ ++/** ++ * dev_gso_segment - Perform emulated hardware segmentation on skb. ++ * @skb: buffer to segment ++ * ++ * This function segments the given skb and stores the list of segments ++ * in skb->next. ++ */ ++static int dev_gso_segment(struct sk_buff *skb) ++{ ++ struct net_device *dev = skb->dev; ++ struct sk_buff *segs; ++ int features = dev->features & ~(illegal_highdma(dev, skb) ? ++ NETIF_F_SG : 0); ++ ++ segs = skb_gso_segment(skb, features); ++ ++ /* Verifying header integrity only. */ ++ if (!segs) ++ return 0; ++ ++ if (unlikely(IS_ERR(segs))) ++ return PTR_ERR(segs); ++ ++ skb->next = segs; ++ DEV_GSO_CB(skb)->destructor = skb->destructor; ++ skb->destructor = dev_gso_skb_destructor; ++ ++ return 0; ++} ++ ++int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) ++{ ++ if (likely(!skb->next)) { ++ if (netdev_nit) ++ dev_queue_xmit_nit(skb, dev); ++ ++ if (netif_needs_gso(dev, skb)) { ++ if (unlikely(dev_gso_segment(skb))) ++ goto out_kfree_skb; ++ if (skb->next) ++ goto gso; ++ } ++ ++ return dev->hard_start_xmit(skb, dev); ++ } ++ ++gso: ++ do { ++ struct sk_buff *nskb = skb->next; ++ int rc; ++ ++ skb->next = nskb->next; ++ nskb->next = NULL; ++ rc = dev->hard_start_xmit(nskb, dev); ++ if (unlikely(rc)) { ++ nskb->next = skb->next; ++ skb->next = nskb; ++ return rc; ++ } ++ if (unlikely(netif_queue_stopped(dev) && skb->next)) ++ return NETDEV_TX_BUSY; ++ } while (skb->next); ++ ++ skb->destructor = DEV_GSO_CB(skb)->destructor; ++ ++out_kfree_skb: ++ kfree_skb(skb); + return 0; + } + + #define HARD_TX_LOCK(dev, cpu) { \ + if ((dev->features & NETIF_F_LLTX) == 0) { \ +- spin_lock(&dev->xmit_lock); \ +- dev->xmit_lock_owner = cpu; \ ++ netif_tx_lock(dev); \ + } \ + } + + #define HARD_TX_UNLOCK(dev) { \ + if ((dev->features & NETIF_F_LLTX) == 0) { \ +- dev->xmit_lock_owner = -1; \ +- spin_unlock(&dev->xmit_lock); \ ++ netif_tx_unlock(dev); \ + } \ + } + +@@ -1246,9 +1319,13 @@ int dev_queue_xmit(struct sk_buff *skb) + struct Qdisc *q; + int rc = -ENOMEM; + ++ /* GSO will handle the following emulations directly. */ ++ if (netif_needs_gso(dev, skb)) ++ goto gso; ++ + if (skb_shinfo(skb)->frag_list && + !(dev->features & NETIF_F_FRAGLIST) && +- __skb_linearize(skb, GFP_ATOMIC)) ++ __skb_linearize(skb)) + goto out_kfree_skb; + + /* Fragmented skb is linearized if device does not support SG, +@@ -1257,25 +1334,26 @@ int dev_queue_xmit(struct sk_buff *skb) + */ + if (skb_shinfo(skb)->nr_frags && + (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) && +- __skb_linearize(skb, GFP_ATOMIC)) ++ __skb_linearize(skb)) + goto out_kfree_skb; + + /* If packet is not checksummed and device does not support + * checksumming for this protocol, complete checksumming here. + */ + if (skb->ip_summed == CHECKSUM_HW && +- (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) && ++ (!(dev->features & NETIF_F_GEN_CSUM) && + (!(dev->features & NETIF_F_IP_CSUM) || + skb->protocol != htons(ETH_P_IP)))) + if (skb_checksum_help(skb, 0)) + goto out_kfree_skb; + ++gso: + spin_lock_prefetch(&dev->queue_lock); + + /* Disable soft irqs for various locks below. Also + * stops preemption for RCU. + */ +- local_bh_disable(); ++ rcu_read_lock_bh(); + + /* Updates of qdisc are serialized by queue_lock. + * The struct Qdisc which is pointed to by qdisc is now a +@@ -1309,8 +1387,8 @@ #endif + /* The device has no queue. Common case for software devices: + loopback, all the sorts of tunnels... + +- Really, it is unlikely that xmit_lock protection is necessary here. +- (f.e. loopback and IP tunnels are clean ignoring statistics ++ Really, it is unlikely that netif_tx_lock protection is necessary ++ here. (f.e. loopback and IP tunnels are clean ignoring statistics + counters.) + However, it is possible, that they rely on protection + made by us here. +@@ -1326,11 +1404,8 @@ #endif + HARD_TX_LOCK(dev, cpu); + + if (!netif_queue_stopped(dev)) { +- if (netdev_nit) +- dev_queue_xmit_nit(skb, dev); +- + rc = 0; +- if (!dev->hard_start_xmit(skb, dev)) { ++ if (!dev_hard_start_xmit(skb, dev)) { + HARD_TX_UNLOCK(dev); + goto out; + } +@@ -1349,13 +1424,13 @@ #endif + } + + rc = -ENETDOWN; +- local_bh_enable(); ++ rcu_read_unlock_bh(); + + out_kfree_skb: + kfree_skb(skb); + return rc; + out: +- local_bh_enable(); ++ rcu_read_unlock_bh(); + return rc; + } + +@@ -2670,7 +2745,7 @@ int register_netdevice(struct net_device + BUG_ON(dev->reg_state != NETREG_UNINITIALIZED); + + spin_lock_init(&dev->queue_lock); +- spin_lock_init(&dev->xmit_lock); ++ spin_lock_init(&dev->_xmit_lock); + dev->xmit_lock_owner = -1; + #ifdef CONFIG_NET_CLS_ACT + spin_lock_init(&dev->ingress_lock); +@@ -2714,9 +2789,7 @@ #endif + + /* Fix illegal SG+CSUM combinations. */ + if ((dev->features & NETIF_F_SG) && +- !(dev->features & (NETIF_F_IP_CSUM | +- NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM))) { ++ !(dev->features & NETIF_F_ALL_CSUM)) { + printk("%s: Dropping NETIF_F_SG since no checksum feature.\n", + dev->name); + dev->features &= ~NETIF_F_SG; +@@ -3268,7 +3341,6 @@ subsys_initcall(net_dev_init); + EXPORT_SYMBOL(__dev_get_by_index); + EXPORT_SYMBOL(__dev_get_by_name); + EXPORT_SYMBOL(__dev_remove_pack); +-EXPORT_SYMBOL(__skb_linearize); + EXPORT_SYMBOL(dev_valid_name); + EXPORT_SYMBOL(dev_add_pack); + EXPORT_SYMBOL(dev_alloc_name); +diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c +index 05d6085..c57d887 100644 +--- a/net/core/dev_mcast.c ++++ b/net/core/dev_mcast.c +@@ -62,7 +62,7 @@ #include <net/arp.h> + * Device mc lists are changed by bh at least if IPv6 is enabled, + * so that it must be bh protected. + * +- * We block accesses to device mc filters with dev->xmit_lock. ++ * We block accesses to device mc filters with netif_tx_lock. + */ + + /* +@@ -93,9 +93,9 @@ static void __dev_mc_upload(struct net_d + + void dev_mc_upload(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + __dev_mc_upload(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + /* +@@ -107,7 +107,7 @@ int dev_mc_delete(struct net_device *dev + int err = 0; + struct dev_mc_list *dmi, **dmip; + +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + + for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) { + /* +@@ -139,13 +139,13 @@ int dev_mc_delete(struct net_device *dev + */ + __dev_mc_upload(dev); + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return 0; + } + } + err = -ENOENT; + done: +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return err; + } + +@@ -160,7 +160,7 @@ int dev_mc_add(struct net_device *dev, v + + dmi1 = kmalloc(sizeof(*dmi), GFP_ATOMIC); + +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) { + if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 && + dmi->dmi_addrlen == alen) { +@@ -176,7 +176,7 @@ int dev_mc_add(struct net_device *dev, v + } + + if ((dmi = dmi1) == NULL) { +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return -ENOMEM; + } + memcpy(dmi->dmi_addr, addr, alen); +@@ -189,11 +189,11 @@ int dev_mc_add(struct net_device *dev, v + + __dev_mc_upload(dev); + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return 0; + + done: +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + kfree(dmi1); + return err; + } +@@ -204,7 +204,7 @@ done: + + void dev_mc_discard(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + + while (dev->mc_list != NULL) { + struct dev_mc_list *tmp = dev->mc_list; +@@ -215,7 +215,7 @@ void dev_mc_discard(struct net_device *d + } + dev->mc_count = 0; + +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + #ifdef CONFIG_PROC_FS +@@ -250,7 +250,7 @@ static int dev_mc_seq_show(struct seq_fi + struct dev_mc_list *m; + struct net_device *dev = v; + +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + for (m = dev->mc_list; m; m = m->next) { + int i; + +@@ -262,7 +262,7 @@ static int dev_mc_seq_show(struct seq_fi + + seq_putc(seq, ''\n''); + } +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + return 0; + } + +diff --git a/net/core/ethtool.c b/net/core/ethtool.c +index e6f7610..27ce168 100644 +--- a/net/core/ethtool.c ++++ b/net/core/ethtool.c +@@ -30,7 +30,7 @@ u32 ethtool_op_get_link(struct net_devic + + u32 ethtool_op_get_tx_csum(struct net_device *dev) + { +- return (dev->features & (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM)) != 0; ++ return (dev->features & NETIF_F_ALL_CSUM) != 0; + } + + int ethtool_op_set_tx_csum(struct net_device *dev, u32 data) +@@ -551,9 +551,7 @@ static int ethtool_set_sg(struct net_dev + return -EFAULT; + + if (edata.data && +- !(dev->features & (NETIF_F_IP_CSUM | +- NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM))) ++ !(dev->features & NETIF_F_ALL_CSUM)) + return -EINVAL; + + return __ethtool_set_sg(dev, edata.data); +@@ -591,7 +589,7 @@ static int ethtool_set_tso(struct net_de + + static int ethtool_get_ufo(struct net_device *dev, char __user *useraddr) + { +- struct ethtool_value edata = { ETHTOOL_GTSO }; ++ struct ethtool_value edata = { ETHTOOL_GUFO }; + + if (!dev->ethtool_ops->get_ufo) + return -EOPNOTSUPP; +@@ -600,6 +598,7 @@ static int ethtool_get_ufo(struct net_de + return -EFAULT; + return 0; + } ++ + static int ethtool_set_ufo(struct net_device *dev, char __user *useraddr) + { + struct ethtool_value edata; +@@ -615,6 +614,29 @@ static int ethtool_set_ufo(struct net_de + return dev->ethtool_ops->set_ufo(dev, edata.data); + } + ++static int ethtool_get_gso(struct net_device *dev, char __user *useraddr) ++{ ++ struct ethtool_value edata = { ETHTOOL_GGSO }; ++ ++ edata.data = dev->features & NETIF_F_GSO; ++ if (copy_to_user(useraddr, &edata, sizeof(edata))) ++ return -EFAULT; ++ return 0; ++} ++ ++static int ethtool_set_gso(struct net_device *dev, char __user *useraddr) ++{ ++ struct ethtool_value edata; ++ ++ if (copy_from_user(&edata, useraddr, sizeof(edata))) ++ return -EFAULT; ++ if (edata.data) ++ dev->features |= NETIF_F_GSO; ++ else ++ dev->features &= ~NETIF_F_GSO; ++ return 0; ++} ++ + static int ethtool_self_test(struct net_device *dev, char __user *useraddr) + { + struct ethtool_test test; +@@ -906,6 +928,12 @@ int dev_ethtool(struct ifreq *ifr) + case ETHTOOL_SUFO: + rc = ethtool_set_ufo(dev, useraddr); + break; ++ case ETHTOOL_GGSO: ++ rc = ethtool_get_gso(dev, useraddr); ++ break; ++ case ETHTOOL_SGSO: ++ rc = ethtool_set_gso(dev, useraddr); ++ break; + default: + rc = -EOPNOTSUPP; + } +diff --git a/net/core/netpoll.c b/net/core/netpoll.c +index ea51f8d..ec28d3b 100644 +--- a/net/core/netpoll.c ++++ b/net/core/netpoll.c +@@ -273,24 +273,21 @@ static void netpoll_send_skb(struct netp + + do { + npinfo->tries--; +- spin_lock(&np->dev->xmit_lock); +- np->dev->xmit_lock_owner = smp_processor_id(); ++ netif_tx_lock(np->dev); + + /* + * network drivers do not expect to be called if the queue is + * stopped. + */ + if (netif_queue_stopped(np->dev)) { +- np->dev->xmit_lock_owner = -1; +- spin_unlock(&np->dev->xmit_lock); ++ netif_tx_unlock(np->dev); + netpoll_poll(np); + udelay(50); + continue; + } + + status = np->dev->hard_start_xmit(skb, np->dev); +- np->dev->xmit_lock_owner = -1; +- spin_unlock(&np->dev->xmit_lock); ++ netif_tx_unlock(np->dev); + + /* success */ + if(!status) { +diff --git a/net/core/pktgen.c b/net/core/pktgen.c +index da16f8f..2380347 100644 +--- a/net/core/pktgen.c ++++ b/net/core/pktgen.c +@@ -2582,7 +2582,7 @@ static __inline__ void pktgen_xmit(struc + } + } + +- spin_lock_bh(&odev->xmit_lock); ++ netif_tx_lock_bh(odev); + if (!netif_queue_stopped(odev)) { + + atomic_inc(&(pkt_dev->skb->users)); +@@ -2627,7 +2627,7 @@ retry_now: + pkt_dev->next_tx_ns = 0; + } + +- spin_unlock_bh(&odev->xmit_lock); ++ netif_tx_unlock_bh(odev); + + /* If pkt_dev->count is zero, then run forever */ + if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) { +diff --git a/net/core/skbuff.c b/net/core/skbuff.c +index 2144952..46f56af 100644 +--- a/net/core/skbuff.c ++++ b/net/core/skbuff.c +@@ -164,9 +164,9 @@ struct sk_buff *__alloc_skb(unsigned int + shinfo = skb_shinfo(skb); + atomic_set(&shinfo->dataref, 1); + shinfo->nr_frags = 0; +- shinfo->tso_size = 0; +- shinfo->tso_segs = 0; +- shinfo->ufo_size = 0; ++ shinfo->gso_size = 0; ++ shinfo->gso_segs = 0; ++ shinfo->gso_type = 0; + shinfo->ip6_frag_id = 0; + shinfo->frag_list = NULL; + +@@ -230,8 +230,9 @@ struct sk_buff *alloc_skb_from_cache(kme + + atomic_set(&(skb_shinfo(skb)->dataref), 1); + skb_shinfo(skb)->nr_frags = 0; +- skb_shinfo(skb)->tso_size = 0; +- skb_shinfo(skb)->tso_segs = 0; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_segs = 0; ++ skb_shinfo(skb)->gso_type = 0; + skb_shinfo(skb)->frag_list = NULL; + out: + return skb; +@@ -501,8 +502,9 @@ #endif + new->tc_index = old->tc_index; + #endif + atomic_set(&new->users, 1); +- skb_shinfo(new)->tso_size = skb_shinfo(old)->tso_size; +- skb_shinfo(new)->tso_segs = skb_shinfo(old)->tso_segs; ++ skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size; ++ skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs; ++ skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type; + } + + /** +@@ -1777,6 +1779,133 @@ int skb_append_datato_frags(struct sock + return 0; + } + ++/** ++ * skb_segment - Perform protocol segmentation on skb. ++ * @skb: buffer to segment ++ * @features: features for the output path (see dev->features) ++ * ++ * This function performs segmentation on the given skb. It returns ++ * the segment at the given position. It returns NULL if there are ++ * no more segments to generate, or when an error is encountered. ++ */ ++struct sk_buff *skb_segment(struct sk_buff *skb, int features) ++{ ++ struct sk_buff *segs = NULL; ++ struct sk_buff *tail = NULL; ++ unsigned int mss = skb_shinfo(skb)->gso_size; ++ unsigned int doffset = skb->data - skb->mac.raw; ++ unsigned int offset = doffset; ++ unsigned int headroom; ++ unsigned int len; ++ int sg = features & NETIF_F_SG; ++ int nfrags = skb_shinfo(skb)->nr_frags; ++ int err = -ENOMEM; ++ int i = 0; ++ int pos; ++ ++ __skb_push(skb, doffset); ++ headroom = skb_headroom(skb); ++ pos = skb_headlen(skb); ++ ++ do { ++ struct sk_buff *nskb; ++ skb_frag_t *frag; ++ int hsize, nsize; ++ int k; ++ int size; ++ ++ len = skb->len - offset; ++ if (len > mss) ++ len = mss; ++ ++ hsize = skb_headlen(skb) - offset; ++ if (hsize < 0) ++ hsize = 0; ++ nsize = hsize + doffset; ++ if (nsize > len + doffset || !sg) ++ nsize = len + doffset; ++ ++ nskb = alloc_skb(nsize + headroom, GFP_ATOMIC); ++ if (unlikely(!nskb)) ++ goto err; ++ ++ if (segs) ++ tail->next = nskb; ++ else ++ segs = nskb; ++ tail = nskb; ++ ++ nskb->dev = skb->dev; ++ nskb->priority = skb->priority; ++ nskb->protocol = skb->protocol; ++ nskb->dst = dst_clone(skb->dst); ++ memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); ++ nskb->pkt_type = skb->pkt_type; ++ nskb->mac_len = skb->mac_len; ++ ++ skb_reserve(nskb, headroom); ++ nskb->mac.raw = nskb->data; ++ nskb->nh.raw = nskb->data + skb->mac_len; ++ nskb->h.raw = nskb->nh.raw + (skb->h.raw - skb->nh.raw); ++ memcpy(skb_put(nskb, doffset), skb->data, doffset); ++ ++ if (!sg) { ++ nskb->csum = skb_copy_and_csum_bits(skb, offset, ++ skb_put(nskb, len), ++ len, 0); ++ continue; ++ } ++ ++ frag = skb_shinfo(nskb)->frags; ++ k = 0; ++ ++ nskb->ip_summed = CHECKSUM_HW; ++ nskb->csum = skb->csum; ++ memcpy(skb_put(nskb, hsize), skb->data + offset, hsize); ++ ++ while (pos < offset + len) { ++ BUG_ON(i >= nfrags); ++ ++ *frag = skb_shinfo(skb)->frags[i]; ++ get_page(frag->page); ++ size = frag->size; ++ ++ if (pos < offset) { ++ frag->page_offset += offset - pos; ++ frag->size -= offset - pos; ++ } ++ ++ k++; ++ ++ if (pos + size <= offset + len) { ++ i++; ++ pos += size; ++ } else { ++ frag->size -= pos + size - (offset + len); ++ break; ++ } ++ ++ frag++; ++ } ++ ++ skb_shinfo(nskb)->nr_frags = k; ++ nskb->data_len = len - hsize; ++ nskb->len += nskb->data_len; ++ nskb->truesize += nskb->data_len; ++ } while ((offset += len) < skb->len); ++ ++ return segs; ++ ++err: ++ while ((skb = segs)) { ++ segs = skb->next; ++ kfree(skb); ++ } ++ return ERR_PTR(err); ++} ++ ++EXPORT_SYMBOL_GPL(skb_segment); ++ + void __init skb_init(void) + { + skbuff_head_cache = kmem_cache_create("skbuff_head_cache", +diff --git a/net/decnet/dn_nsp_in.c b/net/decnet/dn_nsp_in.c +index 44bda85..2e3323a 100644 +--- a/net/decnet/dn_nsp_in.c ++++ b/net/decnet/dn_nsp_in.c +@@ -801,8 +801,7 @@ got_it: + * We linearize everything except data segments here. + */ + if (cb->nsp_flags & ~0x60) { +- if (unlikely(skb_is_nonlinear(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) ++ if (unlikely(skb_linearize(skb))) + goto free_out; + } + +diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c +index 3407f19..a0a25e0 100644 +--- a/net/decnet/dn_route.c ++++ b/net/decnet/dn_route.c +@@ -629,8 +629,7 @@ int dn_route_rcv(struct sk_buff *skb, st + padlen); + + if (flags & DN_RT_PKT_CNTL) { +- if (unlikely(skb_is_nonlinear(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) ++ if (unlikely(skb_linearize(skb))) + goto dump_it; + + switch(flags & DN_RT_CNTL_MSK) { +diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c +index 97c276f..5ba719e 100644 +--- a/net/ipv4/af_inet.c ++++ b/net/ipv4/af_inet.c +@@ -68,6 +68,7 @@ + */ + + #include <linux/config.h> ++#include <linux/err.h> + #include <linux/errno.h> + #include <linux/types.h> + #include <linux/socket.h> +@@ -1084,6 +1085,54 @@ int inet_sk_rebuild_header(struct sock * + + EXPORT_SYMBOL(inet_sk_rebuild_header); + ++static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features) ++{ ++ struct sk_buff *segs = ERR_PTR(-EINVAL); ++ struct iphdr *iph; ++ struct net_protocol *ops; ++ int proto; ++ int ihl; ++ int id; ++ ++ if (!pskb_may_pull(skb, sizeof(*iph))) ++ goto out; ++ ++ iph = skb->nh.iph; ++ ihl = iph->ihl * 4; ++ if (ihl < sizeof(*iph)) ++ goto out; ++ ++ if (!pskb_may_pull(skb, ihl)) ++ goto out; ++ ++ skb->h.raw = __skb_pull(skb, ihl); ++ iph = skb->nh.iph; ++ id = ntohs(iph->id); ++ proto = iph->protocol & (MAX_INET_PROTOS - 1); ++ segs = ERR_PTR(-EPROTONOSUPPORT); ++ ++ rcu_read_lock(); ++ ops = rcu_dereference(inet_protos[proto]); ++ if (ops && ops->gso_segment) ++ segs = ops->gso_segment(skb, features); ++ rcu_read_unlock(); ++ ++ if (!segs || unlikely(IS_ERR(segs))) ++ goto out; ++ ++ skb = segs; ++ do { ++ iph = skb->nh.iph; ++ iph->id = htons(id++); ++ iph->tot_len = htons(skb->len - skb->mac_len); ++ iph->check = 0; ++ iph->check = ip_fast_csum(skb->nh.raw, iph->ihl); ++ } while ((skb = skb->next)); ++ ++out: ++ return segs; ++} ++ + #ifdef CONFIG_IP_MULTICAST + static struct net_protocol igmp_protocol = { + .handler = igmp_rcv, +@@ -1093,6 +1142,7 @@ #endif + static struct net_protocol tcp_protocol = { + .handler = tcp_v4_rcv, + .err_handler = tcp_v4_err, ++ .gso_segment = tcp_tso_segment, + .no_policy = 1, + }; + +@@ -1138,6 +1188,7 @@ static int ipv4_proc_init(void); + static struct packet_type ip_packet_type = { + .type = __constant_htons(ETH_P_IP), + .func = ip_rcv, ++ .gso_segment = inet_gso_segment, + }; + + static int __init inet_init(void) +diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c +index 8dcba38..19c3c73 100644 +--- a/net/ipv4/ip_output.c ++++ b/net/ipv4/ip_output.c +@@ -210,8 +210,7 @@ #if defined(CONFIG_NETFILTER) && defined + return dst_output(skb); + } + #endif +- if (skb->len > dst_mtu(skb->dst) && +- !(skb_shinfo(skb)->ufo_size || skb_shinfo(skb)->tso_size)) ++ if (skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size) + return ip_fragment(skb, ip_finish_output2); + else + return ip_finish_output2(skb); +@@ -362,7 +361,7 @@ packet_routed: + } + + ip_select_ident_more(iph, &rt->u.dst, sk, +- (skb_shinfo(skb)->tso_segs ?: 1) - 1); ++ (skb_shinfo(skb)->gso_segs ?: 1) - 1); + + /* Add an IP checksum. */ + ip_send_check(iph); +@@ -743,7 +742,8 @@ static inline int ip_ufo_append_data(str + (length - transhdrlen)); + if (!err) { + /* specify the length of each IP datagram fragment*/ +- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen); ++ skb_shinfo(skb)->gso_size = mtu - fragheaderlen; ++ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4; + __skb_queue_tail(&sk->sk_write_queue, skb); + + return 0; +@@ -839,7 +839,7 @@ int ip_append_data(struct sock *sk, + */ + if (transhdrlen && + length + fragheaderlen <= mtu && +- rt->u.dst.dev->features&(NETIF_F_IP_CSUM|NETIF_F_NO_CSUM|NETIF_F_HW_CSUM) && ++ rt->u.dst.dev->features & NETIF_F_ALL_CSUM && + !exthdrlen) + csummode = CHECKSUM_HW; + +@@ -1086,14 +1086,16 @@ ssize_t ip_append_page(struct sock *sk, + + inet->cork.length += size; + if ((sk->sk_protocol == IPPROTO_UDP) && +- (rt->u.dst.dev->features & NETIF_F_UFO)) +- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen); ++ (rt->u.dst.dev->features & NETIF_F_UFO)) { ++ skb_shinfo(skb)->gso_size = mtu - fragheaderlen; ++ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4; ++ } + + + while (size > 0) { + int i; + +- if (skb_shinfo(skb)->ufo_size) ++ if (skb_shinfo(skb)->gso_size) + len = size; + else { + +diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c +index d64e2ec..7494823 100644 +--- a/net/ipv4/ipcomp.c ++++ b/net/ipv4/ipcomp.c +@@ -84,7 +84,7 @@ static int ipcomp_input(struct xfrm_stat + struct xfrm_decap_state *decap, struct sk_buff *skb) + { + u8 nexthdr; +- int err = 0; ++ int err = -ENOMEM; + struct iphdr *iph; + union { + struct iphdr iph; +@@ -92,11 +92,8 @@ static int ipcomp_input(struct xfrm_stat + } tmp_iph; + + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { +- err = -ENOMEM; ++ if (skb_linearize_cow(skb)) + goto out; +- } + + skb->ip_summed = CHECKSUM_NONE; + +@@ -171,10 +168,8 @@ static int ipcomp_output(struct xfrm_sta + goto out_ok; + } + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { ++ if (skb_linearize_cow(skb)) + goto out_ok; +- } + + err = ipcomp_compress(x, skb); + iph = skb->nh.iph; +diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c +index 00aa80e..84130c9 100644 +--- a/net/ipv4/tcp.c ++++ b/net/ipv4/tcp.c +@@ -257,6 +257,7 @@ #include <linux/smp_lock.h> + #include <linux/fs.h> + #include <linux/random.h> + #include <linux/bootmem.h> ++#include <linux/err.h> + + #include <net/icmp.h> + #include <net/tcp.h> +@@ -570,7 +571,7 @@ new_segment: + skb->ip_summed = CHECKSUM_HW; + tp->write_seq += copy; + TCP_SKB_CB(skb)->end_seq += copy; +- skb_shinfo(skb)->tso_segs = 0; ++ skb_shinfo(skb)->gso_segs = 0; + + if (!copied) + TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH; +@@ -621,14 +622,10 @@ ssize_t tcp_sendpage(struct socket *sock + ssize_t res; + struct sock *sk = sock->sk; + +-#define TCP_ZC_CSUM_FLAGS (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) +- + if (!(sk->sk_route_caps & NETIF_F_SG) || +- !(sk->sk_route_caps & TCP_ZC_CSUM_FLAGS)) ++ !(sk->sk_route_caps & NETIF_F_ALL_CSUM)) + return sock_no_sendpage(sock, page, offset, size, flags); + +-#undef TCP_ZC_CSUM_FLAGS +- + lock_sock(sk); + TCP_CHECK_TIMER(sk); + res = do_tcp_sendpages(sk, &page, offset, size, flags); +@@ -725,9 +722,7 @@ new_segment: + /* + * Check whether we can use HW checksum. + */ +- if (sk->sk_route_caps & +- (NETIF_F_IP_CSUM | NETIF_F_NO_CSUM | +- NETIF_F_HW_CSUM)) ++ if (sk->sk_route_caps & NETIF_F_ALL_CSUM) + skb->ip_summed = CHECKSUM_HW; + + skb_entail(sk, tp, skb); +@@ -823,7 +818,7 @@ new_segment: + + tp->write_seq += copy; + TCP_SKB_CB(skb)->end_seq += copy; +- skb_shinfo(skb)->tso_segs = 0; ++ skb_shinfo(skb)->gso_segs = 0; + + from += copy; + copied += copy; +@@ -2026,6 +2021,71 @@ int tcp_getsockopt(struct sock *sk, int + } + + ++struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) ++{ ++ struct sk_buff *segs = ERR_PTR(-EINVAL); ++ struct tcphdr *th; ++ unsigned thlen; ++ unsigned int seq; ++ unsigned int delta; ++ unsigned int oldlen; ++ unsigned int len; ++ ++ if (!pskb_may_pull(skb, sizeof(*th))) ++ goto out; ++ ++ th = skb->h.th; ++ thlen = th->doff * 4; ++ if (thlen < sizeof(*th)) ++ goto out; ++ ++ if (!pskb_may_pull(skb, thlen)) ++ goto out; ++ ++ segs = NULL; ++ if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) ++ goto out; ++ ++ oldlen = (u16)~skb->len; ++ __skb_pull(skb, thlen); ++ ++ segs = skb_segment(skb, features); ++ if (IS_ERR(segs)) ++ goto out; ++ ++ len = skb_shinfo(skb)->gso_size; ++ delta = htonl(oldlen + (thlen + len)); ++ ++ skb = segs; ++ th = skb->h.th; ++ seq = ntohl(th->seq); ++ ++ do { ++ th->fin = th->psh = 0; ++ ++ th->check = ~csum_fold(th->check + delta); ++ if (skb->ip_summed != CHECKSUM_HW) ++ th->check = csum_fold(csum_partial(skb->h.raw, thlen, ++ skb->csum)); ++ ++ seq += len; ++ skb = skb->next; ++ th = skb->h.th; ++ ++ th->seq = htonl(seq); ++ th->cwr = 0; ++ } while (skb->next); ++ ++ delta = htonl(oldlen + (skb->tail - skb->h.raw) + skb->data_len); ++ th->check = ~csum_fold(th->check + delta); ++ if (skb->ip_summed != CHECKSUM_HW) ++ th->check = csum_fold(csum_partial(skb->h.raw, thlen, ++ skb->csum)); ++ ++out: ++ return segs; ++} ++ + extern void __skb_cb_too_small_for_tcp(int, int); + extern struct tcp_congestion_ops tcp_reno; + +diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c +index e9a54ae..defe77a 100644 +--- a/net/ipv4/tcp_input.c ++++ b/net/ipv4/tcp_input.c +@@ -1072,7 +1072,7 @@ tcp_sacktag_write_queue(struct sock *sk, + else + pkt_len = (end_seq - + TCP_SKB_CB(skb)->seq); +- if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->tso_size)) ++ if (tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)->gso_size)) + break; + pcount = tcp_skb_pcount(skb); + } +diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c +index 310f2e6..ee01f69 100644 +--- a/net/ipv4/tcp_output.c ++++ b/net/ipv4/tcp_output.c +@@ -497,15 +497,17 @@ static void tcp_set_skb_tso_segs(struct + /* Avoid the costly divide in the normal + * non-TSO case. + */ +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + } else { + unsigned int factor; + + factor = skb->len + (mss_now - 1); + factor /= mss_now; +- skb_shinfo(skb)->tso_segs = factor; +- skb_shinfo(skb)->tso_size = mss_now; ++ skb_shinfo(skb)->gso_segs = factor; ++ skb_shinfo(skb)->gso_size = mss_now; ++ skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + } + } + +@@ -850,7 +852,7 @@ static int tcp_init_tso_segs(struct sock + + if (!tso_segs || + (tso_segs > 1 && +- skb_shinfo(skb)->tso_size != mss_now)) { ++ tcp_skb_mss(skb) != mss_now)) { + tcp_set_skb_tso_segs(sk, skb, mss_now); + tso_segs = tcp_skb_pcount(skb); + } +@@ -1510,8 +1512,9 @@ int tcp_retransmit_skb(struct sock *sk, + tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) { + if (!pskb_trim(skb, 0)) { + TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + skb->ip_summed = CHECKSUM_NONE; + skb->csum = 0; + } +@@ -1716,8 +1719,9 @@ void tcp_send_fin(struct sock *sk) + skb->csum = 0; + TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_FIN); + TCP_SKB_CB(skb)->sacked = 0; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + + /* FIN eats a sequence byte, write_seq advanced by tcp_queue_skb(). */ + TCP_SKB_CB(skb)->seq = tp->write_seq; +@@ -1749,8 +1753,9 @@ void tcp_send_active_reset(struct sock * + skb->csum = 0; + TCP_SKB_CB(skb)->flags = (TCPCB_FLAG_ACK | TCPCB_FLAG_RST); + TCP_SKB_CB(skb)->sacked = 0; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + + /* Send it off. */ + TCP_SKB_CB(skb)->seq = tcp_acceptable_seq(sk, tp); +@@ -1833,8 +1838,9 @@ struct sk_buff * tcp_make_synack(struct + TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn; + TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1; + TCP_SKB_CB(skb)->sacked = 0; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + th->seq = htonl(TCP_SKB_CB(skb)->seq); + th->ack_seq = htonl(tcp_rsk(req)->rcv_isn + 1); + if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */ +@@ -1937,8 +1943,9 @@ int tcp_connect(struct sock *sk) + TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN; + TCP_ECN_send_syn(sk, tp, buff); + TCP_SKB_CB(buff)->sacked = 0; +- skb_shinfo(buff)->tso_segs = 1; +- skb_shinfo(buff)->tso_size = 0; ++ skb_shinfo(buff)->gso_segs = 1; ++ skb_shinfo(buff)->gso_size = 0; ++ skb_shinfo(buff)->gso_type = 0; + buff->csum = 0; + TCP_SKB_CB(buff)->seq = tp->write_seq++; + TCP_SKB_CB(buff)->end_seq = tp->write_seq; +@@ -2042,8 +2049,9 @@ void tcp_send_ack(struct sock *sk) + buff->csum = 0; + TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK; + TCP_SKB_CB(buff)->sacked = 0; +- skb_shinfo(buff)->tso_segs = 1; +- skb_shinfo(buff)->tso_size = 0; ++ skb_shinfo(buff)->gso_segs = 1; ++ skb_shinfo(buff)->gso_size = 0; ++ skb_shinfo(buff)->gso_type = 0; + + /* Send it off, this clears delayed acks for us. */ + TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp); +@@ -2078,8 +2086,9 @@ static int tcp_xmit_probe_skb(struct soc + skb->csum = 0; + TCP_SKB_CB(skb)->flags = TCPCB_FLAG_ACK; + TCP_SKB_CB(skb)->sacked = urgent; +- skb_shinfo(skb)->tso_segs = 1; +- skb_shinfo(skb)->tso_size = 0; ++ skb_shinfo(skb)->gso_segs = 1; ++ skb_shinfo(skb)->gso_size = 0; ++ skb_shinfo(skb)->gso_type = 0; + + /* Use a previous sequence. This should cause the other + * end to send an ack. Don''t queue or clone SKB, just +diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c +index 32ad229..737c1db 100644 +--- a/net/ipv4/xfrm4_output.c ++++ b/net/ipv4/xfrm4_output.c +@@ -9,6 +9,8 @@ + */ + + #include <linux/compiler.h> ++#include <linux/if_ether.h> ++#include <linux/kernel.h> + #include <linux/skbuff.h> + #include <linux/spinlock.h> + #include <linux/netfilter_ipv4.h> +@@ -152,16 +154,10 @@ error_nolock: + goto out_exit; + } + +-static int xfrm4_output_finish(struct sk_buff *skb) ++static int xfrm4_output_finish2(struct sk_buff *skb) + { + int err; + +-#ifdef CONFIG_NETFILTER +- if (!skb->dst->xfrm) { +- IPCB(skb)->flags |= IPSKB_REROUTED; +- return dst_output(skb); +- } +-#endif + while (likely((err = xfrm4_output_one(skb)) == 0)) { + nf_reset(skb); + +@@ -174,7 +170,7 @@ #endif + return dst_output(skb); + + err = nf_hook(PF_INET, NF_IP_POST_ROUTING, &skb, NULL, +- skb->dst->dev, xfrm4_output_finish); ++ skb->dst->dev, xfrm4_output_finish2); + if (unlikely(err != 1)) + break; + } +@@ -182,6 +178,48 @@ #endif + return err; + } + ++static int xfrm4_output_finish(struct sk_buff *skb) ++{ ++ struct sk_buff *segs; ++ ++#ifdef CONFIG_NETFILTER ++ if (!skb->dst->xfrm) { ++ IPCB(skb)->flags |= IPSKB_REROUTED; ++ return dst_output(skb); ++ } ++#endif ++ ++ if (!skb_shinfo(skb)->gso_size) ++ return xfrm4_output_finish2(skb); ++ ++ skb->protocol = htons(ETH_P_IP); ++ segs = skb_gso_segment(skb, 0); ++ kfree_skb(skb); ++ if (unlikely(IS_ERR(segs))) ++ return PTR_ERR(segs); ++ ++ do { ++ struct sk_buff *nskb = segs->next; ++ int err; ++ ++ segs->next = NULL; ++ err = xfrm4_output_finish2(segs); ++ ++ if (unlikely(err)) { ++ while ((segs = nskb)) { ++ nskb = segs->next; ++ segs->next = NULL; ++ kfree_skb(segs); ++ } ++ return err; ++ } ++ ++ segs = nskb; ++ } while (segs); ++ ++ return 0; ++} ++ + int xfrm4_output(struct sk_buff *skb) + { + return NF_HOOK_COND(PF_INET, NF_IP_POST_ROUTING, skb, NULL, skb->dst->dev, +diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c +index 5bf70b1..cf5d17e 100644 +--- a/net/ipv6/ip6_output.c ++++ b/net/ipv6/ip6_output.c +@@ -147,7 +147,7 @@ static int ip6_output2(struct sk_buff *s + + int ip6_output(struct sk_buff *skb) + { +- if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->ufo_size) || ++ if ((skb->len > dst_mtu(skb->dst) && !skb_shinfo(skb)->gso_size) || + dst_allfrag(skb->dst)) + return ip6_fragment(skb, ip6_output2); + else +@@ -829,8 +829,9 @@ static inline int ip6_ufo_append_data(st + struct frag_hdr fhdr; + + /* specify the length of each IP datagram fragment*/ +- skb_shinfo(skb)->ufo_size = (mtu - fragheaderlen) - +- sizeof(struct frag_hdr); ++ skb_shinfo(skb)->gso_size = mtu - fragheaderlen - ++ sizeof(struct frag_hdr); ++ skb_shinfo(skb)->gso_type = SKB_GSO_UDPV4; + ipv6_select_ident(skb, &fhdr); + skb_shinfo(skb)->ip6_frag_id = fhdr.identification; + __skb_queue_tail(&sk->sk_write_queue, skb); +diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c +index d511a88..ef56d5d 100644 +--- a/net/ipv6/ipcomp6.c ++++ b/net/ipv6/ipcomp6.c +@@ -64,7 +64,7 @@ static LIST_HEAD(ipcomp6_tfms_list); + + static int ipcomp6_input(struct xfrm_state *x, struct xfrm_decap_state *decap, struct sk_buff *skb) + { +- int err = 0; ++ int err = -ENOMEM; + u8 nexthdr = 0; + int hdr_len = skb->h.raw - skb->nh.raw; + unsigned char *tmp_hdr = NULL; +@@ -75,11 +75,8 @@ static int ipcomp6_input(struct xfrm_sta + struct crypto_tfm *tfm; + int cpu; + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { +- err = -ENOMEM; ++ if (skb_linearize_cow(skb)) + goto out; +- } + + skb->ip_summed = CHECKSUM_NONE; + +@@ -158,10 +155,8 @@ static int ipcomp6_output(struct xfrm_st + goto out_ok; + } + +- if ((skb_is_nonlinear(skb) || skb_cloned(skb)) && +- skb_linearize(skb, GFP_ATOMIC) != 0) { ++ if (skb_linearize_cow(skb)) + goto out_ok; +- } + + /* compression */ + plen = skb->len - hdr_len; +diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c +index 8024217..39bdeec 100644 +--- a/net/ipv6/xfrm6_output.c ++++ b/net/ipv6/xfrm6_output.c +@@ -151,7 +151,7 @@ error_nolock: + goto out_exit; + } + +-static int xfrm6_output_finish(struct sk_buff *skb) ++static int xfrm6_output_finish2(struct sk_buff *skb) + { + int err; + +@@ -167,7 +167,7 @@ static int xfrm6_output_finish(struct sk + return dst_output(skb); + + err = nf_hook(PF_INET6, NF_IP6_POST_ROUTING, &skb, NULL, +- skb->dst->dev, xfrm6_output_finish); ++ skb->dst->dev, xfrm6_output_finish2); + if (unlikely(err != 1)) + break; + } +@@ -175,6 +175,41 @@ static int xfrm6_output_finish(struct sk + return err; + } + ++static int xfrm6_output_finish(struct sk_buff *skb) ++{ ++ struct sk_buff *segs; ++ ++ if (!skb_shinfo(skb)->gso_size) ++ return xfrm6_output_finish2(skb); ++ ++ skb->protocol = htons(ETH_P_IP); ++ segs = skb_gso_segment(skb, 0); ++ kfree_skb(skb); ++ if (unlikely(IS_ERR(segs))) ++ return PTR_ERR(segs); ++ ++ do { ++ struct sk_buff *nskb = segs->next; ++ int err; ++ ++ segs->next = NULL; ++ err = xfrm6_output_finish2(segs); ++ ++ if (unlikely(err)) { ++ while ((segs = nskb)) { ++ nskb = segs->next; ++ segs->next = NULL; ++ kfree_skb(segs); ++ } ++ return err; ++ } ++ ++ segs = nskb; ++ } while (segs); ++ ++ return 0; ++} ++ + int xfrm6_output(struct sk_buff *skb) + { + return NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, skb, NULL, skb->dst->dev, +diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c +index 99ceb91..28c9efd 100644 +--- a/net/sched/sch_generic.c ++++ b/net/sched/sch_generic.c +@@ -72,9 +72,9 @@ void qdisc_unlock_tree(struct net_device + dev->queue_lock serializes queue accesses for this device + AND dev->qdisc pointer itself. + +- dev->xmit_lock serializes accesses to device driver. ++ netif_tx_lock serializes accesses to device driver. + +- dev->queue_lock and dev->xmit_lock are mutually exclusive, ++ dev->queue_lock and netif_tx_lock are mutually exclusive, + if one is grabbed, another must be free. + */ + +@@ -90,14 +90,17 @@ void qdisc_unlock_tree(struct net_device + NOTE: Called under dev->queue_lock with locally disabled BH. + */ + +-int qdisc_restart(struct net_device *dev) ++static inline int qdisc_restart(struct net_device *dev) + { + struct Qdisc *q = dev->qdisc; + struct sk_buff *skb; + + /* Dequeue packet */ +- if ((skb = q->dequeue(q)) != NULL) { ++ if (((skb = dev->gso_skb)) || ((skb = q->dequeue(q)))) { + unsigned nolock = (dev->features & NETIF_F_LLTX); ++ ++ dev->gso_skb = NULL; ++ + /* + * When the driver has LLTX set it does its own locking + * in start_xmit. No need to add additional overhead by +@@ -108,7 +111,7 @@ int qdisc_restart(struct net_device *dev + * will be requeued. + */ + if (!nolock) { +- if (!spin_trylock(&dev->xmit_lock)) { ++ if (!netif_tx_trylock(dev)) { + collision: + /* So, someone grabbed the driver. */ + +@@ -126,8 +129,6 @@ int qdisc_restart(struct net_device *dev + __get_cpu_var(netdev_rx_stat).cpu_collision++; + goto requeue; + } +- /* Remember that the driver is grabbed by us. */ +- dev->xmit_lock_owner = smp_processor_id(); + } + + { +@@ -136,14 +137,11 @@ int qdisc_restart(struct net_device *dev + + if (!netif_queue_stopped(dev)) { + int ret; +- if (netdev_nit) +- dev_queue_xmit_nit(skb, dev); + +- ret = dev->hard_start_xmit(skb, dev); ++ ret = dev_hard_start_xmit(skb, dev); + if (ret == NETDEV_TX_OK) { + if (!nolock) { +- dev->xmit_lock_owner = -1; +- spin_unlock(&dev->xmit_lock); ++ netif_tx_unlock(dev); + } + spin_lock(&dev->queue_lock); + return -1; +@@ -157,8 +155,7 @@ int qdisc_restart(struct net_device *dev + /* NETDEV_TX_BUSY - we need to requeue */ + /* Release the driver */ + if (!nolock) { +- dev->xmit_lock_owner = -1; +- spin_unlock(&dev->xmit_lock); ++ netif_tx_unlock(dev); + } + spin_lock(&dev->queue_lock); + q = dev->qdisc; +@@ -175,7 +172,10 @@ int qdisc_restart(struct net_device *dev + */ + + requeue: +- q->ops->requeue(skb, q); ++ if (skb->next) ++ dev->gso_skb = skb; ++ else ++ q->ops->requeue(skb, q); + netif_schedule(dev); + return 1; + } +@@ -183,11 +183,23 @@ requeue: + return q->q.qlen; + } + ++void __qdisc_run(struct net_device *dev) ++{ ++ if (unlikely(dev->qdisc == &noop_qdisc)) ++ goto out; ++ ++ while (qdisc_restart(dev) < 0 && !netif_queue_stopped(dev)) ++ /* NOTHING */; ++ ++out: ++ clear_bit(__LINK_STATE_QDISC_RUNNING, &dev->state); ++} ++ + static void dev_watchdog(unsigned long arg) + { + struct net_device *dev = (struct net_device *)arg; + +- spin_lock(&dev->xmit_lock); ++ netif_tx_lock(dev); + if (dev->qdisc != &noop_qdisc) { + if (netif_device_present(dev) && + netif_running(dev) && +@@ -201,7 +213,7 @@ static void dev_watchdog(unsigned long a + dev_hold(dev); + } + } +- spin_unlock(&dev->xmit_lock); ++ netif_tx_unlock(dev); + + dev_put(dev); + } +@@ -225,17 +237,17 @@ void __netdev_watchdog_up(struct net_dev + + static void dev_watchdog_up(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + __netdev_watchdog_up(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + static void dev_watchdog_down(struct net_device *dev) + { +- spin_lock_bh(&dev->xmit_lock); ++ netif_tx_lock_bh(dev); + if (del_timer(&dev->watchdog_timer)) + __dev_put(dev); +- spin_unlock_bh(&dev->xmit_lock); ++ netif_tx_unlock_bh(dev); + } + + void netif_carrier_on(struct net_device *dev) +@@ -577,10 +589,17 @@ void dev_deactivate(struct net_device *d + + dev_watchdog_down(dev); + +- while (test_bit(__LINK_STATE_SCHED, &dev->state)) ++ /* Wait for outstanding dev_queue_xmit calls. */ ++ synchronize_rcu(); ++ ++ /* Wait for outstanding qdisc_run calls. */ ++ while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state)) + yield(); + +- spin_unlock_wait(&dev->xmit_lock); ++ if (dev->gso_skb) { ++ kfree_skb(dev->gso_skb); ++ dev->gso_skb = NULL; ++ } + } + + void dev_init_scheduler(struct net_device *dev) +@@ -622,6 +641,5 @@ EXPORT_SYMBOL(qdisc_create_dflt); + EXPORT_SYMBOL(qdisc_alloc); + EXPORT_SYMBOL(qdisc_destroy); + EXPORT_SYMBOL(qdisc_reset); +-EXPORT_SYMBOL(qdisc_restart); + EXPORT_SYMBOL(qdisc_lock_tree); + EXPORT_SYMBOL(qdisc_unlock_tree); +diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c +index 79b8ef3..4c16ad5 100644 +--- a/net/sched/sch_teql.c ++++ b/net/sched/sch_teql.c +@@ -302,20 +302,17 @@ restart: + + switch (teql_resolve(skb, skb_res, slave)) { + case 0: +- if (spin_trylock(&slave->xmit_lock)) { +- slave->xmit_lock_owner = smp_processor_id(); ++ if (netif_tx_trylock(slave)) { + if (!netif_queue_stopped(slave) && + slave->hard_start_xmit(skb, slave) == 0) { +- slave->xmit_lock_owner = -1; +- spin_unlock(&slave->xmit_lock); ++ netif_tx_unlock(slave); + master->slaves = NEXT_SLAVE(q); + netif_wake_queue(dev); + master->stats.tx_packets++; + master->stats.tx_bytes += len; + return 0; + } +- slave->xmit_lock_owner = -1; +- spin_unlock(&slave->xmit_lock); ++ netif_tx_unlock(slave); + } + if (netif_queue_stopped(dev)) + busy = 1; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-28 03:59 UTC
[Xen-devel] [2/5] [NET]: Give make_tx_response the req pointer instead of id
Hi: [NET]: Give make_tx_response the req pointer instead of id This patch changes the make_tx_response id argument to a request pointer instead. This allows us to test the request flag in future for TSO. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 6913e0756b81 -r 504315a3ec5e linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:44:18 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:51:08 2006 +1000 @@ -43,7 +43,7 @@ static void netif_idx_release(u16 pendin static void netif_idx_release(u16 pending_idx); static void netif_page_release(struct page *page); static void make_tx_response(netif_t *netif, - u16 id, + netif_tx_request_t *txp, s8 st); static int make_rx_response(netif_t *netif, u16 id, @@ -481,7 +481,7 @@ inline static void net_tx_action_dealloc netif = pending_tx_info[pending_idx].netif; - make_tx_response(netif, pending_tx_info[pending_idx].req.id, + make_tx_response(netif, &pending_tx_info[pending_idx].req, NETIF_RSP_OKAY); pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; @@ -496,7 +496,7 @@ static void netbk_tx_err(netif_t *netif, do { netif_tx_request_t *txp = RING_GET_REQUEST(&netif->tx, cons); - make_tx_response(netif, txp->id, NETIF_RSP_ERROR); + make_tx_response(netif, txp, NETIF_RSP_ERROR); } while (++cons < end); netif->tx.req_cons = cons; netif_schedule_work(netif); @@ -581,7 +581,7 @@ static int netbk_tx_check_mop(struct sk_ err = mop->status; if (unlikely(err)) { txp = &pending_tx_info[pending_idx].req; - make_tx_response(netif, txp->id, NETIF_RSP_ERROR); + make_tx_response(netif, txp, NETIF_RSP_ERROR); pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; netif_put(netif); } else { @@ -614,7 +614,7 @@ static int netbk_tx_check_mop(struct sk_ /* Error on this fragment: respond to client with an error. */ txp = &pending_tx_info[pending_idx].req; - make_tx_response(netif, txp->id, NETIF_RSP_ERROR); + make_tx_response(netif, txp, NETIF_RSP_ERROR); pending_ring[MASK_PEND_IDX(pending_prod++)] = pending_idx; netif_put(netif); @@ -898,7 +898,7 @@ irqreturn_t netif_be_int(int irq, void * } static void make_tx_response(netif_t *netif, - u16 id, + netif_tx_request_t *txp, s8 st) { RING_IDX i = netif->tx.rsp_prod_pvt; @@ -906,7 +906,7 @@ static void make_tx_response(netif_t *ne int notify; resp = RING_GET_RESPONSE(&netif->tx, i); - resp->id = id; + resp->id = txp->id; resp->status = st; netif->tx.rsp_prod_pvt = ++i; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] loopback: Added support for TSO Just like SG, TSO support here is innate. So all we need to do is mark it as such. This patch also adds the ethtool control functions for SG. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 504315a3ec5e -r 7ec216a8bc14 linux-2.6-xen-sparse/drivers/xen/netback/loopback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c Wed Jun 28 13:51:08 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/loopback.c Wed Jun 28 13:51:21 2006 +1000 @@ -125,6 +125,10 @@ static struct ethtool_ops network_ethtoo { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = ethtool_op_set_tx_csum, + .get_sg = ethtool_op_get_sg, + .set_sg = ethtool_op_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = ethtool_op_set_tso, }; /* @@ -152,6 +156,7 @@ static void loopback_construct(struct ne dev->features = (NETIF_F_HIGHDMA | NETIF_F_LLTX | + NETIF_F_TSO | NETIF_F_SG | NETIF_F_IP_CSUM); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] back: Add TSO support This patch adds TCP Segmentation Offload (TSO) support to the backend. It also advertises this fact through xenbus so that the frontend can detect this and send through TSO requests only if it is supported. This is done using an extra request slot which is indicated by a flag in the first slot. In future checksum offload can be done in the same way. The extra request slot must not be generated if the backend does not support the appropriate feature bits. For now this is simply feature-tso. If the frontend detects the presence of the appropriate feature bits, it may generate TX requests which have the appropriate request flags set that indicates the presence of an extra request slot with the extra information. On the backend the extra request slot is read if and only if the request flags are set in the TX request. This protocol allows more feature bits to be added in future without breaking compatibility. At least the hardware checksum bit is planned. Even though only TSO is supported for now the code actually supports GSO so it can be applied to any other protocol. The only missing bit is the detection of host support for a specific GSO protocol. Once that is added we can advertise all supported protocols to the guest. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 7ec216a8bc14 -r 9853b45712e8 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:51:21 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:55:13 2006 +1000 @@ -490,14 +490,16 @@ inline static void net_tx_action_dealloc } } -static void netbk_tx_err(netif_t *netif, RING_IDX end) +static void netbk_tx_err(netif_t *netif, netif_tx_request_t *txp, RING_IDX end) { RING_IDX cons = netif->tx.req_cons; do { - netif_tx_request_t *txp = RING_GET_REQUEST(&netif->tx, cons); make_tx_response(netif, txp, NETIF_RSP_ERROR); - } while (++cons < end); + if (++cons >= end) + break; + txp = RING_GET_REQUEST(&netif->tx, cons); + } while (1); netif->tx.req_cons = cons; netif_schedule_work(netif); netif_put(netif); @@ -508,7 +510,7 @@ static int netbk_count_requests(netif_t { netif_tx_request_t *first = txp; RING_IDX cons = netif->tx.req_cons; - int frags = 1; + int frags = 0; while (txp->flags & NETTXF_more_data) { if (frags >= work_to_do) { @@ -543,7 +545,7 @@ static gnttab_map_grant_ref_t *netbk_get skb_frag_t *frags = shinfo->frags; netif_tx_request_t *txp; unsigned long pending_idx = *((u16 *)skb->data); - RING_IDX cons = netif->tx.req_cons + 1; + RING_IDX cons = netif->tx.req_cons; int i, start; /* Skip first skb fragment if it is on same page as header fragment. */ @@ -668,6 +670,7 @@ static void net_tx_action(unsigned long struct sk_buff *skb; netif_t *netif; netif_tx_request_t txreq; + struct netif_tx_extra txtra; u16 pending_idx; RING_IDX i; gnttab_map_grant_ref_t *mop; @@ -726,22 +729,37 @@ static void net_tx_action(unsigned long } netif->remaining_credit -= txreq.size; + work_to_do--; + netif->tx.req_cons = ++i; + + if (txreq.flags & NETTXF_extra_info) { + if (work_to_do-- <= 0) { + DPRINTK("Missing extra info\n"); + netbk_tx_err(netif, &txreq, i); + continue; + } + + memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i), + sizeof(txtra)); + netif->tx.req_cons = ++i; + } + ret = netbk_count_requests(netif, &txreq, work_to_do); if (unlikely(ret < 0)) { - netbk_tx_err(netif, i - ret); + netbk_tx_err(netif, &txreq, i - ret); continue; } i += ret; if (unlikely(ret > MAX_SKB_FRAGS + 1)) { DPRINTK("Too many frags\n"); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); continue; } if (unlikely(txreq.size < ETH_HLEN)) { DPRINTK("Bad packet size: %d\n", txreq.size); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); continue; } @@ -750,25 +768,31 @@ static void net_tx_action(unsigned long DPRINTK("txreq.offset: %x, size: %u, end: %lu\n", txreq.offset, txreq.size, (txreq.offset &~PAGE_MASK) + txreq.size); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); continue; } pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)]; data_len = (txreq.size > PKT_PROT_LEN && - ret < MAX_SKB_FRAGS + 1) ? + ret < MAX_SKB_FRAGS) ? PKT_PROT_LEN : txreq.size; skb = alloc_skb(data_len+16, GFP_ATOMIC); if (unlikely(skb == NULL)) { DPRINTK("Can''t allocate a skb in start_xmit.\n"); - netbk_tx_err(netif, i); + netbk_tx_err(netif, &txreq, i); break; } /* Packets passed to netif_rx() must have some headroom. */ skb_reserve(skb, 16); + + if (txreq.flags & NETTXF_gso) { + skb_shinfo(skb)->gso_size = txtra.gso_size; + skb_shinfo(skb)->gso_segs = txtra.gso_segs; + skb_shinfo(skb)->gso_type = txtra.gso_type; + } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), GNTMAP_host_map | GNTMAP_readonly, @@ -782,7 +806,7 @@ static void net_tx_action(unsigned long __skb_put(skb, data_len); - skb_shinfo(skb)->nr_frags = ret - 1; + skb_shinfo(skb)->nr_frags = ret; if (data_len < txreq.size) { skb_shinfo(skb)->nr_frags++; skb_shinfo(skb)->frags[0].page @@ -908,6 +932,9 @@ static void make_tx_response(netif_t *ne resp = RING_GET_RESPONSE(&netif->tx, i); resp->id = txp->id; resp->status = st; + + if (txp->flags & NETTXF_extra_info) + RING_GET_RESPONSE(&netif->tx, ++i)->status = NETIF_RSP_NULL; netif->tx.rsp_prod_pvt = ++i; RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->tx, notify); diff -r 7ec216a8bc14 -r 9853b45712e8 linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Wed Jun 28 13:51:21 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Wed Jun 28 13:55:13 2006 +1000 @@ -101,6 +101,12 @@ static int netback_probe(struct xenbus_d goto abort_transaction; } + err = xenbus_printf(xbt, dev->nodename, "feature-tso", "%d", 1); + if (err) { + message = "writing feature-tso"; + goto abort_transaction; + } + err = xenbus_transaction_end(xbt, 0); } while (err == -EAGAIN); diff -r 7ec216a8bc14 -r 9853b45712e8 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Wed Jun 28 13:51:21 2006 +1000 +++ b/xen/include/public/io/netif.h Wed Jun 28 13:55:13 2006 +1000 @@ -31,6 +31,13 @@ #define _NETTXF_more_data (2) #define NETTXF_more_data (1U<<_NETTXF_more_data) +/* Packet has GSO fields. */ +#define _NETTXF_gso (3) +#define NETTXF_gso (1U<<_NETTXF_gso) + +/* Packet to be folloed by extra descritptor. */ +#define NETTXF_extra_info (NETTXF_gso) + struct netif_tx_request { grant_ref_t gref; /* Reference to buffer page */ uint16_t offset; /* Offset within buffer page */ @@ -39,6 +46,13 @@ struct netif_tx_request { uint16_t size; /* Packet size in bytes. */ }; typedef struct netif_tx_request netif_tx_request_t; + +/* This structure needs to fit within netif_tx_request for compatibility. */ +struct netif_tx_extra { + uint16_t gso_size; /* GSO MSS. */ + uint16_t gso_segs; /* GSO segment count. */ + uint16_t gso_type; /* GSO type. */ +}; struct netif_tx_response { uint16_t id; @@ -78,6 +92,7 @@ DEFINE_RING_TYPES(netif_rx, struct netif #define NETIF_RSP_DROPPED -2 #define NETIF_RSP_ERROR -1 #define NETIF_RSP_OKAY 0 +#define NETIF_RSP_NULL 1 #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-28 04:00 UTC
[Xen-devel] [5/5] [NET] front: Transmit TSO packets if supported
Hi: [NET] front: Transmit TSO packets if supported This patch adds TSO transmission support to the frontend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 9853b45712e8 -r 9f4e79081e4a linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Wed Jun 28 13:55:13 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Wed Jun 28 13:55:32 2006 +1000 @@ -463,7 +463,7 @@ static int network_open(struct net_devic static inline int netfront_tx_slot_available(struct netfront_info *np) { - return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1; + return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2; } static inline void network_maybe_wake_tx(struct net_device *dev) @@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net rmb(); /* Ensure we see responses up to ''rp''. */ for (cons = np->tx.rsp_cons; cons != prod; cons++) { - id = RING_GET_RESPONSE(&np->tx, cons)->id; + struct netif_tx_response *txrsp; + + txrsp = RING_GET_RESPONSE(&np->tx, cons); + if (txrsp->status == NETIF_RSP_NULL) + continue; + + id = txrsp->id; skb = np->tx_skbs[id]; if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { @@ -739,7 +745,8 @@ static int network_start_xmit(struct sk_ spin_lock_irq(&np->tx_lock); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)))) { + (frags > 1 && !xennet_can_sg(dev)) || + netif_needs_gso(dev, skb))) { spin_unlock_irq(&np->tx_lock); goto drop; } @@ -762,9 +769,21 @@ static int network_start_xmit(struct sk_ tx->size = len; tx->flags = 0; - if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ + if (skb->ip_summed == CHECKSUM_HW) { + /* local packet? */ tx->flags |= NETTXF_csum_blank | NETTXF_data_validated; - if (skb->proto_data_valid) /* remote but checksummed? */ + + if (skb_shinfo(skb)->gso_size) { + struct netif_tx_extra *txtra + (struct netif_tx_extra *) + RING_GET_REQUEST(&np->tx, ++i); + + tx->flags |= NETTXF_gso; + txtra->gso_size = skb_shinfo(skb)->gso_size; + txtra->gso_segs = skb_shinfo(skb)->gso_segs; + txtra->gso_type = skb_shinfo(skb)->gso_type; + } + } else if (skb->proto_data_valid) /* remote but checksummed? */ tx->flags |= NETTXF_data_validated; np->tx.req_prod_pvt = i + 1; @@ -1065,9 +1084,26 @@ static int xennet_set_sg(struct net_devi return ethtool_op_set_sg(dev, data); } +static int xennet_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + struct netfront_info *np = netdev_priv(dev); + int val; + + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso", + "%d", &val) < 0) + val = 0; + if (!val) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + static void xennet_set_features(struct net_device *dev) { - xennet_set_sg(dev, 1); + if (!xennet_set_sg(dev, 1)) + xennet_set_tso(dev, 1); } static void network_connect(struct net_device *dev) @@ -1148,6 +1184,8 @@ static struct ethtool_ops network_ethtoo .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = xennet_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = xennet_set_tso, }; #ifdef CONFIG_SYSFS _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27 Jun 2006, at 13:02, Herbert Xu wrote:> The following patches add TCP Segmentation Offload (TSO) support in the > domU => dom0 direction. If everyone''s happy with this approach then > it''s > trivial to do the same thing for the opposite direction.I''ve checked in all but the netfront patch. The glitches so far are: 1. The GSO patch broke netback, because netback/interface.c accesses dev->xmit_lock. None of your patches fixed this and you can''t build the driver unless it''s fixed -- did you somehow miss that file from one of the patches you sent? 2. The new ''wire format'' with netif_tx_extra: I placed the GSO fields inside a struct inside a union, so we can extend the union with other extra-info types in future. I hope that''s okay and in line with what you intended. 3. Wire format again: we need some extra documentation and info in netif.h for the new GSO fields. Currently they conveniently directly correspond to fields in a Linux skbuff: you read them out in netfront and write them straight back in netback. That''s fine for Linux for now, but not so good for other OSes, nor potentially if the Linux GSO internals change later. In particular the gso_type field is concerning. We should provide defines for the legitimate values of that field in netif.h, with a comment explaining what each one means. Extra comments are also required for the other two fields, to convince us that they aren''t Linux-specific in some way. Some brief info about how GSO works in general, including usage of those fields, would help. This is why I haven''t added the netfront patch yet -- I don''t want domU''s using the new interface until we''re satisfied we''re not going to have to change the interface and break compatibility. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jun 28, 2006 at 01:30:10PM +0100, Keir Fraser wrote:> > 1. The GSO patch broke netback, because netback/interface.c accesses > dev->xmit_lock. None of your patches fixed this and you can''t build the > driver unless it''s fixed -- did you somehow miss that file from one of > the patches you sent?Sorry, I missed that hunk while generating the patches.> 2. The new ''wire format'' with netif_tx_extra: I placed the GSO fields > inside a struct inside a union, so we can extend the union with other > extra-info types in future. I hope that''s okay and in line with what > you intended.Actually, the idea is to add new fields after the existing fields so I don''t think a union is a good fit here. The reason is that the new fields will likely be used in conjunction with GSO. In particular, I''m thinking of the checksum offset/header offset.> 3. Wire format again: we need some extra documentation and info in > netif.h for the new GSO fields. Currently they conveniently directly > correspond to fields in a Linux skbuff: you read them out in netfront > and write them straight back in netback. That''s fine for Linux for now, > but not so good for other OSes, nor potentially if the Linux GSO > internals change later.Well I think things like TSO (and even more so with GSO) are highly OS-specific so porting them to other paravirt OSes are always going to be hard. The way I see it these are simply add-on features that you can enable to get that extra bit out of your Xen performance. So it is not required for your system to function. Therefore other OSes that do not have TSO/GSO can simply elect to not use it. (I''m curious, which other paravirt OSes do you have in mind that would use something like TSO/GSO? Do they currently support TSO or something similar?) That''s how it was designed in general: if the frontend doesn''t know about GSO it simply never sends any GSO packets. If the backend doesn''t know about GSO it''d never advertise it to the frontend so again no GSO packets would be sent. As to how likely the implementation details are to change in Linux I''d say that it probably won''t happen anytime soon but I can''t offer any guarantees because it is an internel interface. However, the design of the Xen wire format is such that it should be easy to adapt to any incompatible changes by either * Provide translation layers in the netfront/netback if the interface change does not require new information to be exchanged. * Adding new feature bits/request flags to indicate that new information is required. Older guests/hosts would simply not do TSO with incompatible new hosts/guests.> In particular the gso_type field is concerning. We should provide > defines for the legitimate values of that field in netif.h, with a > comment explaining what each one means. Extra comments are also > required for the other two fields, to convince us that they aren''t > Linux-specific in some way. Some brief info about how GSO works in > general, including usage of those fields, would help.No problems. I''ve attached a patch that adds more comments for these fields. However, I''m really hesitant to add the actual gso_type values here because I don''t think this helps compatibility in any way. If Linux ever does make an incompatible change (which believe me I will do my best to prevent), having those values defined here are not really going to help people notice the change or provide compatibility. But if you really want these values, I can add them. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 9f4e79081e4a xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Wed Jun 28 13:55:32 2006 +1000 +++ b/xen/include/public/io/netif.h Wed Jun 28 23:54:47 2006 +1000 @@ -49,9 +49,23 @@ typedef struct netif_tx_request netif_tx /* This structure needs to fit within netif_tx_request for compatibility. */ struct netif_tx_extra { - uint16_t gso_size; /* GSO MSS. */ - uint16_t gso_segs; /* GSO segment count. */ - uint16_t gso_type; /* GSO type. */ + /* + * Maximum payload size of each segment. For example, for TCP this is + * just the path MSS. + */ + uint16_t gso_size; + + /* + * Number of GSO segments. This is the number of segments that have to + * be generated for this packet given the MSS. + */ + uint16_t gso_segs; + + /* + * GSO type. This determines the protocol of the packet and any extra + * features required to segment the packet properly. + */ + uint16_t gso_type; }; struct netif_tx_response { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28 Jun 2006, at 15:02, Herbert Xu wrote:>> 2. The new ''wire format'' with netif_tx_extra: I placed the GSO fields >> inside a struct inside a union, so we can extend the union with other >> extra-info types in future. I hope that''s okay and in line with what >> you intended. > > Actually, the idea is to add new fields after the existing fields so > I don''t think a union is a good fit here. The reason is that the new > fields will likely be used in conjunction with GSO. In particular, I''m > thinking of the checksum offset/header offset.Adding new fields on the end of the existing structure is limiting. Maybe we could chain extra info structures by declaring NETTXF_extra_info in the leading request, then have a sequence of netif_tx_extras, each of which is a discriminated union (so a NETEXTRA_* type field, a flag indicating if this is the last extra-info for this packet, plus a union)?> >> 3. Wire format again: we need some extra documentation and info in >> netif.h for the new GSO fields. Currently they conveniently directly >> correspond to fields in a Linux skbuff: you read them out in netfront >> and write them straight back in netback. That''s fine for Linux for >> now, >> but not so good for other OSes, nor potentially if the Linux GSO >> internals change later. > > Well I think things like TSO (and even more so with GSO) are highly > OS-specific so porting them to other paravirt OSes are always going to > be hard.Many NICs support TSO so there should be support in network stacks other than Linux. What about *BSD, Solaris, and Windows?> The way I see it these are simply add-on features that you can enable > to get that extra bit out of your Xen performance. So it is not > required > for your system to function. Therefore other OSes that do not have > TSO/GSO can simply elect to not use it.They may not have GSO but they might well have TSO and we should make it possible to interface to netback using it.> * Provide translation layers in the netfront/netback if the interface > change does not require new information to be exchanged. > > * Adding new feature bits/request flags to indicate that new > information > is required. Older guests/hosts would simply not do TSO with > incompatible > new hosts/guests.TSO isn''t that complicated. I think we should be able to come up with an inter-domain format that we don''t have to break for older guests down the line.> If Linux ever does make an incompatible change (which believe me I will > do my best to prevent), having those values defined here are not really > going to help people notice the change or provide compatibility. > > But if you really want these values, I can add them.Yes, we want them, and explicit code in netfront/netback to convert between Linux gso types and ''wire'' gso types. Even if they are initially the same! Another question: why are gso_size and gso_segs both required? Surely those, plus the overall request size, are redundant. e.g., shouldn''t gso_segs = tot_size / gso_size (rounded up)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jun 28, 2006 at 03:24:37PM +0100, Keir Fraser wrote:> > Adding new fields on the end of the existing structure is limiting. > Maybe we could chain extra info structures by declaring > NETTXF_extra_info in the leading request, then have a sequence of > netif_tx_extras, each of which is a discriminated union (so a > NETEXTRA_* type field, a flag indicating if this is the last extra-info > for this packet, plus a union)?Good idea. I''ve changed the interface to do exactly that.> Many NICs support TSO so there should be support in network stacks > other than Linux. What about *BSD, Solaris, and Windows?They should be able to use the GSO interface and simply always set gso_type to XEN_GSO_TCPV4.> Yes, we want them, and explicit code in netfront/netback to convert > between Linux gso types and ''wire'' gso types. Even if they are > initially the same!Done.> Another question: why are gso_size and gso_segs both required? Surely > those, plus the overall request size, are redundant. e.g., shouldn''t > gso_segs = tot_size / gso_size (rounded up)?For TSO, gso_segs can be easily determined from the packet and gso_size. However, for GSO, we don''t know the packet header length so the same is not true. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-29 00:21 UTC
[Xen-devel] [1/2] [NET] back: Make extra info slot more generic
Hi: [NET] back: Make extra info slot more generic Based on suggestions by Keir Fraser, this patch makes the extra slot more generic by allowing it to be chained and giving each slot a type field. This makes it easier to add new extra slots such as checksum offset. I''ve also added GSO type constants specific for Xen. For now the conversion function between them and Linux is a noop. When and if they do diverge we can modify them accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 9853b45712e8 -r b5ca6be8ad55 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:55:13 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Thu Jun 29 10:16:22 2006 +1000 @@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b } } +int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras, + int work_to_do) +{ + struct netif_tx_extra *txtra; + int i = netif->tx.req_cons; + + do { + + if (unlikely(work_to_do-- <= 0)) { + DPRINTK("Missing extra info\n"); + return -EBADR; + } + + txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx, + i); + if (unlikely(!txtra->type || + txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) { + netif->tx.req_cons = ++i; + DPRINTK("Invalid extra type: %d\n", txtra->type); + return -EINVAL; + } + + memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra)); + netif->tx.req_cons = ++i; + } while (txtra->flags & XEN_NETIF_TXTRA_MORE); + + return work_to_do; +} + /* Called after netfront has transmitted */ static void net_tx_action(unsigned long unused) { @@ -670,7 +699,7 @@ static void net_tx_action(unsigned long struct sk_buff *skb; netif_t *netif; netif_tx_request_t txreq; - struct netif_tx_extra txtra; + struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1]; u16 pending_idx; RING_IDX i; gnttab_map_grant_ref_t *mop; @@ -732,16 +761,15 @@ static void net_tx_action(unsigned long work_to_do--; netif->tx.req_cons = ++i; + memset(txtras, 0, sizeof(txtras)); if (txreq.flags & NETTXF_extra_info) { - if (work_to_do-- <= 0) { - DPRINTK("Missing extra info\n"); - netbk_tx_err(netif, &txreq, i); + work_to_do = netbk_get_txtras(netif, txtras, + work_to_do); + if (unlikely(work_to_do < 0)) { + netbk_tx_err(netif, &txreq, 0); continue; } - - memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i), - sizeof(txtra)); - netif->tx.req_cons = ++i; + i = netif->tx.req_cons; } ret = netbk_count_requests(netif, &txreq, work_to_do); @@ -788,10 +816,15 @@ static void net_tx_action(unsigned long /* Packets passed to netif_rx() must have some headroom. */ skb_reserve(skb, 16); - if (txreq.flags & NETTXF_gso) { - skb_shinfo(skb)->gso_size = txtra.gso_size; - skb_shinfo(skb)->gso_segs = txtra.gso_segs; - skb_shinfo(skb)->gso_type = txtra.gso_type; + if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) { + struct netif_tx_extra *gso = txtras - 1 + + XEN_NETIF_TXTRA_TYPE_GSO; + + skb_shinfo(skb)->gso_size = gso->gso.size; + skb_shinfo(skb)->gso_segs = gso->gso.segs; + skb_shinfo(skb)->gso_type + xen_gso_type_xen2linux(gso->gso.type); + } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), diff -r 9853b45712e8 -r b5ca6be8ad55 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Wed Jun 28 13:55:13 2006 +1000 +++ b/xen/include/public/io/netif.h Thu Jun 29 10:16:22 2006 +1000 @@ -31,12 +31,28 @@ #define _NETTXF_more_data (2) #define NETTXF_more_data (1U<<_NETTXF_more_data) -/* Packet has GSO fields. */ -#define _NETTXF_gso (3) -#define NETTXF_gso (1U<<_NETTXF_gso) +/* Packet to be folloed by extra descritptor. */ +#define _NETTXF_extra_info (3) +#define NETTXF_extra_info (1U<<_NETTXF_extra_info) -/* Packet to be folloed by extra descritptor. */ -#define NETTXF_extra_info (NETTXF_gso) +enum { + XEN_NETIF_TXTRA_TYPE_GSO = 1, + XEN_NETIF_TXTRA_TYPE_MAX, +}; + +enum { + XEN_NETIF_TXTRA_MORE = 1 << 0, +}; + +enum { + /* TCP over IPv4. */ + XEN_GSO_TCPV4 = 1 << 0, + /* UDP over IPv4. */ + XEN_GSO_UDPV4 = 1 << 1, + + /* Packet header must be verified. */ + XEN_GSO_DODGY = 1 << 2, +}; struct netif_tx_request { grant_ref_t gref; /* Reference to buffer page */ @@ -47,11 +63,35 @@ struct netif_tx_request { }; typedef struct netif_tx_request netif_tx_request_t; -/* This structure needs to fit within netif_tx_request for compatibility. */ +/* + * This structure needs to fit within both netif_tx_request and + * netif_rx_response for compatibility. + */ struct netif_tx_extra { - uint16_t gso_size; /* GSO MSS. */ - uint16_t gso_segs; /* GSO segment count. */ - uint16_t gso_type; /* GSO type. */ + /* Type of extra info. */ + uint8_t type; + /* Flags for this info. */ + uint8_t flags; + + union { + /* + * Maximum payload size of each segment. For example, for TCP this is + * just the path MSS. + */ + uint16_t size; + + /* + * Number of GSO segments. This is the number of segments that have to + * be generated for this packet given the MSS. + */ + uint16_t segs; + + /* + * GSO type. This determines the protocol of the packet and any extra + * features required to segment the packet properly. + */ + uint16_t type; + } gso; }; struct netif_tx_response { @@ -94,6 +134,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif #define NETIF_RSP_OKAY 0 #define NETIF_RSP_NULL 1 +/* For now the GSO types are identical. */ +static inline int xen_gso_type_linux2xen(int type) +{ + return type; +} + +static inline int xen_gso_type_xen2linux(int type) +{ + return type; +} + #endif /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-29 00:21 UTC
[Xen-devel] [2/2] [NET] front: Transmit TSO packets if supported
Hi: [NET] front: Transmit TSO packets if supported This patch adds TSO transmission support to the frontend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r b5ca6be8ad55 -r 247c57e5b85a linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Thu Jun 29 10:16:22 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Thu Jun 29 10:16:49 2006 +1000 @@ -463,7 +463,7 @@ static int network_open(struct net_devic static inline int netfront_tx_slot_available(struct netfront_info *np) { - return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1; + return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2; } static inline void network_maybe_wake_tx(struct net_device *dev) @@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net rmb(); /* Ensure we see responses up to ''rp''. */ for (cons = np->tx.rsp_cons; cons != prod; cons++) { - id = RING_GET_RESPONSE(&np->tx, cons)->id; + struct netif_tx_response *txrsp; + + txrsp = RING_GET_RESPONSE(&np->tx, cons); + if (txrsp->status == NETIF_RSP_NULL) + continue; + + id = txrsp->id; skb = np->tx_skbs[id]; if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { @@ -719,6 +725,7 @@ static int network_start_xmit(struct sk_ unsigned short id; struct netfront_info *np = netdev_priv(dev); struct netif_tx_request *tx; + struct netif_tx_extra *txtra; char *data = skb->data; RING_IDX i; grant_ref_t ref; @@ -739,7 +746,8 @@ static int network_start_xmit(struct sk_ spin_lock_irq(&np->tx_lock); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)))) { + (frags > 1 && !xennet_can_sg(dev)) || + netif_needs_gso(dev, skb))) { spin_unlock_irq(&np->tx_lock); goto drop; } @@ -762,10 +770,31 @@ static int network_start_xmit(struct sk_ tx->size = len; tx->flags = 0; + txtra = NULL; + if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ tx->flags |= NETTXF_csum_blank | NETTXF_data_validated; if (skb->proto_data_valid) /* remote but checksummed? */ tx->flags |= NETTXF_data_validated; + + if (skb_shinfo(skb)->gso_size) { + struct netif_tx_extra *gso + (struct netif_tx_extra *)RING_GET_REQUEST(&np->tx, ++i); + + if (txtra) + txtra->flags |= XEN_NETIF_TXTRA_MORE; + else + tx->flags |= NETTXF_extra_info; + + gso->gso.size = skb_shinfo(skb)->gso_size; + gso->gso.segs = skb_shinfo(skb)->gso_segs; + gso->gso.type + xen_gso_type_linux2xen(skb_shinfo(skb)->gso_type); + + gso->type = XEN_NETIF_TXTRA_TYPE_GSO; + gso->flags = 0; + txtra = gso; + } np->tx.req_prod_pvt = i + 1; @@ -1065,9 +1094,26 @@ static int xennet_set_sg(struct net_devi return ethtool_op_set_sg(dev, data); } +static int xennet_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + struct netfront_info *np = netdev_priv(dev); + int val; + + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso", + "%d", &val) < 0) + val = 0; + if (!val) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + static void xennet_set_features(struct net_device *dev) { - xennet_set_sg(dev, 1); + if (!xennet_set_sg(dev, 1)) + xennet_set_tso(dev, 1); } static void network_connect(struct net_device *dev) @@ -1148,6 +1194,8 @@ static struct ethtool_ops network_ethtoo .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = xennet_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = xennet_set_tso, }; #ifdef CONFIG_SYSFS _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-29 04:44 UTC
[Xen-devel] Re: [1/2] [NET] back: Make extra info slot more generic
On Thu, Jun 29, 2006 at 10:21:23AM +1000, herbert wrote:> > [NET] back: Make extra info slot more genericHere it is again rebased on xen-unstable: [NET] back: Make extra info slot more generic Based on suggestions by Keir Fraser, this patch makes the extra slot more generic by allowing it to be chained and giving each slot a type field. This makes it easier to add new extra slots such as checksum offset. I''ve also added GSO type constants specific for Xen. For now the conversion function between them and Linux is a noop. When and if they do diverge we can modify them accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r ae245d35457b -r 41464d29a901 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:59:29 2006 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Thu Jun 29 14:43:14 2006 +1000 @@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b } } +int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras, + int work_to_do) +{ + struct netif_tx_extra *txtra; + int i = netif->tx.req_cons; + + do { + + if (unlikely(work_to_do-- <= 0)) { + DPRINTK("Missing extra info\n"); + return -EBADR; + } + + txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx, + i); + if (unlikely(!txtra->type || + txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) { + netif->tx.req_cons = ++i; + DPRINTK("Invalid extra type: %d\n", txtra->type); + return -EINVAL; + } + + memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra)); + netif->tx.req_cons = ++i; + } while (txtra->flags & XEN_NETIF_TXTRA_MORE); + + return work_to_do; +} + /* Called after netfront has transmitted */ static void net_tx_action(unsigned long unused) { @@ -670,7 +699,7 @@ static void net_tx_action(unsigned long struct sk_buff *skb; netif_t *netif; netif_tx_request_t txreq; - struct netif_tx_extra txtra; + struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1]; u16 pending_idx; RING_IDX i; gnttab_map_grant_ref_t *mop; @@ -732,16 +761,15 @@ static void net_tx_action(unsigned long work_to_do--; netif->tx.req_cons = ++i; + memset(txtras, 0, sizeof(txtras)); if (txreq.flags & NETTXF_extra_info) { - if (work_to_do-- <= 0) { - DPRINTK("Missing extra info\n"); - netbk_tx_err(netif, &txreq, i); + work_to_do = netbk_get_txtras(netif, txtras, + work_to_do); + if (unlikely(work_to_do < 0)) { + netbk_tx_err(netif, &txreq, 0); continue; } - - memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i), - sizeof(txtra)); - netif->tx.req_cons = ++i; + i = netif->tx.req_cons; } ret = netbk_count_requests(netif, &txreq, work_to_do); @@ -788,10 +816,14 @@ static void net_tx_action(unsigned long /* Packets passed to netif_rx() must have some headroom. */ skb_reserve(skb, 16); - if (txreq.flags & NETTXF_gso) { - skb_shinfo(skb)->gso_size = txtra.u.gso.size; - skb_shinfo(skb)->gso_segs = txtra.u.gso.segs; - skb_shinfo(skb)->gso_type = txtra.u.gso.type; + if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) { + struct netif_tx_extra *gso = txtras - 1 + + XEN_NETIF_TXTRA_TYPE_GSO; + + skb_shinfo(skb)->gso_size = gso->gso.size; + skb_shinfo(skb)->gso_segs = gso->gso.segs; + skb_shinfo(skb)->gso_type + xen_gso_type_xen2linux(gso->gso.type); } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), diff -r ae245d35457b -r 41464d29a901 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Wed Jun 28 13:59:29 2006 +0100 +++ b/xen/include/public/io/netif.h Thu Jun 29 14:43:14 2006 +1000 @@ -41,12 +41,28 @@ #define _NETTXF_more_data (2) #define NETTXF_more_data (1U<<_NETTXF_more_data) -/* Packet has GSO fields in the following descriptor (netif_tx_extra.u.gso). */ -#define _NETTXF_gso (3) -#define NETTXF_gso (1U<<_NETTXF_gso) +/* Packet to be folloed by extra descritptor. */ +#define _NETTXF_extra_info (3) +#define NETTXF_extra_info (1U<<_NETTXF_extra_info) -/* This descriptor is followed by an extra-info descriptor (netif_tx_extra). */ -#define NETTXF_extra_info (NETTXF_gso) +enum { + XEN_NETIF_TXTRA_TYPE_GSO = 1, + XEN_NETIF_TXTRA_TYPE_MAX, +}; + +enum { + XEN_NETIF_TXTRA_MORE = 1 << 0, +}; + +enum { + /* TCP over IPv4. */ + XEN_GSO_TCPV4 = 1 << 0, + /* UDP over IPv4. */ + XEN_GSO_UDPV4 = 1 << 1, + + /* Packet header must be verified. */ + XEN_GSO_DODGY = 1 << 2, +}; struct netif_tx_request { grant_ref_t gref; /* Reference to buffer page */ @@ -57,16 +73,35 @@ struct netif_tx_request { }; typedef struct netif_tx_request netif_tx_request_t; -/* This structure needs to fit within netif_tx_request for compatibility. */ +/* + * This structure needs to fit within both netif_tx_request and + * netif_rx_response for compatibility. + */ struct netif_tx_extra { + /* Type of extra info. */ + uint8_t type; + /* Flags for this info. */ + uint8_t flags; + union { - /* NETTXF_gso: Generic Segmentation Offload. */ - struct netif_tx_gso { - uint16_t size; /* GSO MSS. */ - uint16_t segs; /* GSO segment count. */ - uint16_t type; /* GSO type. */ - } gso; - } u; + /* + * Maximum payload size of each segment. For example, for TCP this is + * just the path MSS. + */ + uint16_t size; + + /* + * Number of GSO segments. This is the number of segments that have to + * be generated for this packet given the MSS. + */ + uint16_t segs; + + /* + * GSO type. This determines the protocol of the packet and any extra + * features required to segment the packet properly. + */ + uint16_t type; + } gso; }; struct netif_tx_response { @@ -110,6 +145,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif /* No response: used for auxiliary requests (e.g., netif_tx_extra). */ #define NETIF_RSP_NULL 1 +/* For now the GSO types are identical. */ +static inline int xen_gso_type_linux2xen(int type) +{ + return type; +} + +static inline int xen_gso_type_xen2linux(int type) +{ + return type; +} + #endif /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Jun 2006, at 01:20, Herbert Xu wrote:>> Another question: why are gso_size and gso_segs both required? Surely >> those, plus the overall request size, are redundant. e.g., shouldn''t >> gso_segs = tot_size / gso_size (rounded up)? > > For TSO, gso_segs can be easily determined from the packet and > gso_size. > However, for GSO, we don''t know the packet header length so the same is > not true.Each segment will need a header though, I''d imagine, so whoever does the segmentation needs to know the packet header length? Maybe I''m just confused. :-) Could you briefly explain what the inter-domain data format would be (e.g., is there a header, etc.), and gso_{size,segs}, for some arbitrary IP-encapsulated protocol? And how that information would be used to perform segmentation in the backend domain? Is the segmentation algorithm any different at all when the protocol is specifically TCPv4? I''d like to add some documentation of all this to netif.h when I have it clear in my head. The patches you sent look fine. I''d really like to understand this stuff before turning it on though. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jun 29, 2006 at 10:02:20AM +0100, Keir Fraser wrote:> > >For TSO, gso_segs can be easily determined from the packet and > >gso_size. > >However, for GSO, we don''t know the packet header length so the same is > >not true. > > Each segment will need a header though, I''d imagine, so whoever does > the segmentation needs to know the packet header length? Maybe I''m justThe code or chip that actually does the segmentation will obviously know the header length. However, netback or netfront does not. Since Linux requires gso_segs to be set (it''s used to figure out how many packets there are going to be before segmentation takes place), we really need to set it at the source where the packet is produced. For another OS that only supports TSO, they would simply have to set gso_segs in the frontend driver.> confused. :-) Could you briefly explain what the inter-domain data > format would be (e.g., is there a header, etc.), and gso_{size,segs}, > for some arbitrary IP-encapsulated protocol? And how that informationGSO packets are simply given as one continuous chunk of data with a gso_type that determines the protocol (e.g., TCPv4 or UDPv4) and a set of features that the packet requires (e.g., header verification or ECN for TCPv4). The parameter gso_size is required to perform the actual segmentation. The parameter gso_segs is used by bits that sit in front of the actual segmentation to figure out how many segments will be produced.> would be used to perform segmentation in the backend domain? Is the > segmentation algorithm any different at all when the protocol is > specifically TCPv4? I''d like to add some documentation of all this to > netif.h when I have it clear in my head.How segmentation is performed is protocol-specific. In general, all headers are duplicated and modified for each segment. The modification is protocol-specific. For TCP it involves clearing header flags depending on whether it''s the first, a middle or the last packet, changing the sequence number and checksum. You can have a look at the tcp_segment in net/ipv4/tcp.c. The only catch is that most of the hardware out there can''t deal with ECN so by default we turn that off (that''ll actually change soon now that GSO is here). Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Jun 2006, at 10:40, Herbert Xu wrote:> On Thu, Jun 29, 2006 at 10:02:20AM +0100, Keir Fraser wrote: >> >>> For TSO, gso_segs can be easily determined from the packet and >>> gso_size. >>> However, for GSO, we don''t know the packet header length so the same >>> is >>> not true. >> >> Each segment will need a header though, I''d imagine, so whoever does >> the segmentation needs to know the packet header length? Maybe I''m >> just > > The code or chip that actually does the segmentation will obviously > know the header length. However, netback or netfront does not. > > Since Linux requires gso_segs to be set (it''s used to figure out how > many packets there are going to be before segmentation takes place), > we really need to set it at the source where the packet is produced. > > For another OS that only supports TSO, they would simply have to set > gso_segs in the frontend driver.What if gso_segs, or any other gso parameter (e.g., type), is set incorrectly? The backend cannot trust frontend clients, so I''m rather worried that we could make the backend network stack crash! Also we should probably only define a XEN_GSO_TCPV4 flag for now, since we support only plain TSO right now. No skbuffs should get passed to netfront that have other GSO flags set, right? Seince we only advertise NETIF_F_TSO. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jun 29, 2006 at 11:32:43AM +0100, Keir Fraser wrote:> > What if gso_segs, or any other gso parameter (e.g., type), is set > incorrectly? The backend cannot trust frontend clients, so I''m rather > worried that we could make the backend network stack crash!gso_type is a key so the OS (Linux) must deal with all possible values. Any packet with a protocol or feature bit that it does not recognise will be dropped. However, you have a very good point regarding gso_segs. I''ll change it so that it is always recalculated for SKB_GSO_DODGY packets.> Also we should probably only define a XEN_GSO_TCPV4 flag for now, since > we support only plain TSO right now. No skbuffs should get passed to > netfront that have other GSO flags set, right? Seince we only advertise > NETIF_F_TSO.While it shouldn''t have any adverse effects for the reason above, I will add a mask so that only bits in the mask are allowed through to gso_type. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Jun 2006, at 13:46, Herbert Xu wrote:>> What if gso_segs, or any other gso parameter (e.g., type), is set >> incorrectly? The backend cannot trust frontend clients, so I''m rather >> worried that we could make the backend network stack crash! > > gso_type is a key so the OS (Linux) must deal with all possible values. > Any packet with a protocol or feature bit that it does not recognise > will be dropped. > > However, you have a very good point regarding gso_segs. I''ll change it > so that it is always recalculated for SKB_GSO_DODGY packets.If you can recalculate it, why is the field required at all (both in the skbuff and in the netif_tx_extra)? You previously said it was needed because it can''t be recalculated (until you get to the segmentation code, which may be very late in packet processing). We could calculate gso_segs for TCPv4 in netback, which is all we care about right now. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jun 29, 2006 at 02:20:21PM +0100, Keir Fraser wrote:> > If you can recalculate it, why is the field required at all (both in > the skbuff and in the netif_tx_extra)? You previously said it was > needed because it can''t be recalculated (until you get to the > segmentation code, which may be very late in packet processing).It can''t be recalculated in general (i.e., in netback). However, within each protocol module it can be recalculated.> We could calculate gso_segs for TCPv4 in netback, which is all we care > about right now.I''d rather not add protocol-specific knowledge to netback. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 29 Jun 2006, at 14:26, Herbert Xu wrote:>> If you can recalculate it, why is the field required at all (both in >> the skbuff and in the netif_tx_extra)? You previously said it was >> needed because it can''t be recalculated (until you get to the >> segmentation code, which may be very late in packet processing). > > It can''t be recalculated in general (i.e., in netback). However, > within > each protocol module it can be recalculated.In that case we don''t need gso_segs in the netif_tx_extra? That, plus limit gso_type, would make me much happier. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jun 29, 2006 at 02:41:50PM +0100, Keir Fraser wrote:> > In that case we don''t need gso_segs in the netif_tx_extra? That, plus > limit gso_type, would make me much happier.That''s the plan! Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: Here we go, these are all based on xen-unstable so please disregard all previous unmerged patches from me. [NET] back: Fix maximum fragment check The maximum fragment check from the frontend is off by one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r ae245d35457b -r 89ffce2cc120 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:59:29 2006 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jun 30 12:13:20 2006 +1000 @@ -751,7 +751,7 @@ static void net_tx_action(unsigned long } i += ret; - if (unlikely(ret > MAX_SKB_FRAGS + 1)) { + if (unlikely(ret > MAX_SKB_FRAGS)) { DPRINTK("Too many frags\n"); netbk_tx_err(netif, &txreq, i); continue; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET]: Add net-tso.patch This patch has been submitted upstream for review. It resets gso_segs for TSO. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 89ffce2cc120 -r 752477acd893 patches/linux-2.6.16.13/net-tso.patch --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/patches/linux-2.6.16.13/net-tso.patch Fri Jun 30 12:13:27 2006 +1000 @@ -0,0 +1,28 @@ +diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c +index 0336422..0bb0ac9 100644 +--- a/net/ipv4/tcp.c ++++ b/net/ipv4/tcp.c +@@ -2166,13 +2166,19 @@ struct sk_buff *tcp_tso_segment(struct s + if (!pskb_may_pull(skb, thlen)) + goto out; + +- segs = NULL; +- if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) +- goto out; +- + oldlen = (u16)~skb->len; + __skb_pull(skb, thlen); + ++ if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { ++ /* Packet is from an untrusted source, reset gso_segs. */ ++ int mss = skb_shinfo(skb)->gso_size; ++ ++ skb_shinfo(skb)->gso_segs = (skb->len + mss - 1) / mss; ++ ++ segs = NULL; ++ goto out; ++ } ++ + segs = skb_segment(skb, features); + if (IS_ERR(segs)) + goto out; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-30 02:42 UTC
[Xen-devel] [3/4] [NET] back: Make extra info slot more generic
Hi: [NET] back: Make extra info slot more generic Based on suggestions by Keir Fraser, this patch makes the extra slot more generic by allowing it to be chained and giving each slot a type field. This makes it easier to add new extra slots such as checksum offset. I''ve also added GSO type constants specific for Xen. For now the conversion function between them and Linux is a noop. When and if they do diverge we can modify them accordingly. This patch also gets rid of gso_segs which is now always recomputed for SKB_GSO_DODGY packets. Last but not least netback now actually sets SKB_GSO_DODGY. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 752477acd893 -r bd0c0b635dc1 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jun 30 12:13:27 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jun 30 12:36:57 2006 +1000 @@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b } } +int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras, + int work_to_do) +{ + struct netif_tx_extra *txtra; + int i = netif->tx.req_cons; + + do { + + if (unlikely(work_to_do-- <= 0)) { + DPRINTK("Missing extra info\n"); + return -EBADR; + } + + txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx, + i); + if (unlikely(!txtra->type || + txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) { + netif->tx.req_cons = ++i; + DPRINTK("Invalid extra type: %d\n", txtra->type); + return -EINVAL; + } + + memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra)); + netif->tx.req_cons = ++i; + } while (txtra->flags & XEN_NETIF_TXTRA_MORE); + + return work_to_do; +} + /* Called after netfront has transmitted */ static void net_tx_action(unsigned long unused) { @@ -670,7 +699,8 @@ static void net_tx_action(unsigned long struct sk_buff *skb; netif_t *netif; netif_tx_request_t txreq; - struct netif_tx_extra txtra; + struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1]; + struct netif_tx_extra *gso; u16 pending_idx; RING_IDX i; gnttab_map_grant_ref_t *mop; @@ -732,16 +762,15 @@ static void net_tx_action(unsigned long work_to_do--; netif->tx.req_cons = ++i; + memset(txtras, 0, sizeof(txtras)); if (txreq.flags & NETTXF_extra_info) { - if (work_to_do-- <= 0) { - DPRINTK("Missing extra info\n"); - netbk_tx_err(netif, &txreq, i); + work_to_do = netbk_get_txtras(netif, txtras, + work_to_do); + if (unlikely(work_to_do < 0)) { + netbk_tx_err(netif, &txreq, 0); continue; } - - memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i), - sizeof(txtra)); - netif->tx.req_cons = ++i; + i = netif->tx.req_cons; } ret = netbk_count_requests(netif, &txreq, work_to_do); @@ -772,6 +801,32 @@ static void net_tx_action(unsigned long continue; } + gso = NULL; + if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) { + /* In future this will be retrieved from Linux. */ + unsigned int allowed_types = SKB_GSO_TCPV4 | + SKB_GSO_DODGY; + unsigned type; + + gso = &txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1]; + if (!gso->gso.size) { + DPRINTK("GSO size must not be zero\n"); + netbk_tx_err(netif, &txreq, i); + continue; + } + + type = xen_gso_type_xen2linux(gso->gso.type); + if (!type || (type & allowed_types) != type) { + DPRINTK("Bogus GSO type: 0x%x\n", + gso->gso.type); + netbk_tx_err(netif, &txreq, i); + continue; + } + + /* Whoever gets this needs to verify the header. */ + gso->gso.type = type | SKB_GSO_DODGY; + } + pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)]; data_len = (txreq.size > PKT_PROT_LEN && @@ -788,10 +843,9 @@ static void net_tx_action(unsigned long /* Packets passed to netif_rx() must have some headroom. */ skb_reserve(skb, 16); - if (txreq.flags & NETTXF_gso) { - skb_shinfo(skb)->gso_size = txtra.u.gso.size; - skb_shinfo(skb)->gso_segs = txtra.u.gso.segs; - skb_shinfo(skb)->gso_type = txtra.u.gso.type; + if (gso) { + skb_shinfo(skb)->gso_size = gso->gso.size; + skb_shinfo(skb)->gso_type = gso->gso.type; } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), diff -r 752477acd893 -r bd0c0b635dc1 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Fri Jun 30 12:13:27 2006 +1000 +++ b/xen/include/public/io/netif.h Fri Jun 30 12:36:57 2006 +1000 @@ -41,12 +41,28 @@ #define _NETTXF_more_data (2) #define NETTXF_more_data (1U<<_NETTXF_more_data) -/* Packet has GSO fields in the following descriptor (netif_tx_extra.u.gso). */ -#define _NETTXF_gso (3) -#define NETTXF_gso (1U<<_NETTXF_gso) +/* Packet to be folloed by extra descritptor. */ +#define _NETTXF_extra_info (3) +#define NETTXF_extra_info (1U<<_NETTXF_extra_info) -/* This descriptor is followed by an extra-info descriptor (netif_tx_extra). */ -#define NETTXF_extra_info (NETTXF_gso) +enum { + XEN_NETIF_TXTRA_TYPE_GSO = 1, + XEN_NETIF_TXTRA_TYPE_MAX, +}; + +enum { + XEN_NETIF_TXTRA_MORE = 1 << 0, +}; + +enum { + /* TCP over IPv4. */ + XEN_GSO_TCPV4 = 1 << 0, + /* UDP over IPv4. */ + XEN_GSO_UDPV4 = 1 << 1, + + /* Packet header must be verified. */ + XEN_GSO_DODGY = 1 << 2, +}; struct netif_tx_request { grant_ref_t gref; /* Reference to buffer page */ @@ -57,16 +73,29 @@ struct netif_tx_request { }; typedef struct netif_tx_request netif_tx_request_t; -/* This structure needs to fit within netif_tx_request for compatibility. */ +/* + * This structure needs to fit within both netif_tx_request and + * netif_rx_response for compatibility. + */ struct netif_tx_extra { + /* Type of extra info. */ + uint8_t type; + /* Flags for this info. */ + uint8_t flags; + union { - /* NETTXF_gso: Generic Segmentation Offload. */ - struct netif_tx_gso { - uint16_t size; /* GSO MSS. */ - uint16_t segs; /* GSO segment count. */ - uint16_t type; /* GSO type. */ - } gso; - } u; + /* + * Maximum payload size of each segment. For example, for TCP this is + * just the path MSS. + */ + uint16_t size; + + /* + * GSO type. This determines the protocol of the packet and any extra + * features required to segment the packet properly. + */ + uint16_t type; + } gso; }; struct netif_tx_response { @@ -110,6 +139,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif /* No response: used for auxiliary requests (e.g., netif_tx_extra). */ #define NETIF_RSP_NULL 1 +/* For now the GSO types are identical. */ +static inline int xen_gso_type_linux2xen(int type) +{ + return type; +} + +static inline int xen_gso_type_xen2linux(int type) +{ + return type; +} + #endif /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-30 02:43 UTC
[Xen-devel] [4/4] [NET] front: Transmit TSO packets if supported
Hi: [NET] front: Transmit TSO packets if supported This patch adds TSO transmission support to the frontend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r bd0c0b635dc1 -r bef11f078479 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jun 30 12:36:57 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jun 30 12:38:18 2006 +1000 @@ -463,7 +463,7 @@ static int network_open(struct net_devic static inline int netfront_tx_slot_available(struct netfront_info *np) { - return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1; + return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2; } static inline void network_maybe_wake_tx(struct net_device *dev) @@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net rmb(); /* Ensure we see responses up to ''rp''. */ for (cons = np->tx.rsp_cons; cons != prod; cons++) { - id = RING_GET_RESPONSE(&np->tx, cons)->id; + struct netif_tx_response *txrsp; + + txrsp = RING_GET_RESPONSE(&np->tx, cons); + if (txrsp->status == NETIF_RSP_NULL) + continue; + + id = txrsp->id; skb = np->tx_skbs[id]; if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { @@ -719,6 +725,7 @@ static int network_start_xmit(struct sk_ unsigned short id; struct netfront_info *np = netdev_priv(dev); struct netif_tx_request *tx; + struct netif_tx_extra *txtra; char *data = skb->data; RING_IDX i; grant_ref_t ref; @@ -739,7 +746,8 @@ static int network_start_xmit(struct sk_ spin_lock_irq(&np->tx_lock); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)))) { + (frags > 1 && !xennet_can_sg(dev)) || + netif_needs_gso(dev, skb))) { spin_unlock_irq(&np->tx_lock); goto drop; } @@ -762,10 +770,30 @@ static int network_start_xmit(struct sk_ tx->size = len; tx->flags = 0; + txtra = NULL; + if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ tx->flags |= NETTXF_csum_blank | NETTXF_data_validated; if (skb->proto_data_valid) /* remote but checksummed? */ tx->flags |= NETTXF_data_validated; + + if (skb_shinfo(skb)->gso_size) { + struct netif_tx_extra *gso + (struct netif_tx_extra *)RING_GET_REQUEST(&np->tx, ++i); + + if (txtra) + txtra->flags |= XEN_NETIF_TXTRA_MORE; + else + tx->flags |= NETTXF_extra_info; + + gso->gso.size = skb_shinfo(skb)->gso_size; + gso->gso.type + xen_gso_type_linux2xen(skb_shinfo(skb)->gso_type); + + gso->type = XEN_NETIF_TXTRA_TYPE_GSO; + gso->flags = 0; + txtra = gso; + } np->tx.req_prod_pvt = i + 1; @@ -1065,9 +1093,26 @@ static int xennet_set_sg(struct net_devi return ethtool_op_set_sg(dev, data); } +static int xennet_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + struct netfront_info *np = netdev_priv(dev); + int val; + + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso", + "%d", &val) < 0) + val = 0; + if (!val) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + static void xennet_set_features(struct net_device *dev) { - xennet_set_sg(dev, 1); + if (!xennet_set_sg(dev, 1)) + xennet_set_tso(dev, 1); } static void network_connect(struct net_device *dev) @@ -1148,6 +1193,8 @@ static struct ethtool_ops network_ethtoo .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = xennet_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = xennet_set_tso, }; #ifdef CONFIG_SYSFS _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: On Fri, Jun 30, 2006 at 12:40:47PM +1000, herbert wrote:> > [NET] back: Fix maximum fragment checkThe net-tso patch has been merged upstream. I''ve also changed the feature-tso interface to be a bit mask of the XEN gso_types bits. It''s now called feature-gso. This means we won''t have to add one feature for each protocol. So here is a repost of the entire series. [NET] back: Fix maximum fragment check The maximum fragment check from the frontend is off by one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r ae245d35457b -r 617e4d3351f3 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Wed Jun 28 13:59:29 2006 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jun 30 22:12:59 2006 +1000 @@ -751,7 +751,7 @@ static void net_tx_action(unsigned long } i += ret; - if (unlikely(ret > MAX_SKB_FRAGS + 1)) { + if (unlikely(ret > MAX_SKB_FRAGS)) { DPRINTK("Too many frags\n"); netbk_tx_err(netif, &txreq, i); continue; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET]: Update net-gso.patch New changeset merged upstream: [TCP]: Reset gso_segs if packet is dodgy I wasn''t paranoid enough in verifying GSO information. A bogus gso_segs could upset drivers as much as a bogus header would. Let''s reset it in the per-protocol gso_segment functions. I didn''t verify gso_size because that can be verified by the source of the dodgy packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 617e4d3351f3 -r f6806ad757d5 patches/linux-2.6.16.13/net-gso.patch --- a/patches/linux-2.6.16.13/net-gso.patch Fri Jun 30 22:12:59 2006 +1000 +++ b/patches/linux-2.6.16.13/net-gso.patch Fri Jun 30 22:16:02 2006 +1000 @@ -2225,7 +2225,7 @@ index d64e2ec..7494823 100644 err = ipcomp_compress(x, skb); iph = skb->nh.iph; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c -index 00aa80e..84130c9 100644 +index 00aa80e..30c81a8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -257,6 +257,7 @@ #include <linux/smp_lock.h> @@ -2281,7 +2281,7 @@ index 00aa80e..84130c9 100644 from += copy; copied += copy; -@@ -2026,6 +2021,71 @@ int tcp_getsockopt(struct sock *sk, int +@@ -2026,6 +2021,77 @@ int tcp_getsockopt(struct sock *sk, int } @@ -2306,12 +2306,18 @@ index 00aa80e..84130c9 100644 + if (!pskb_may_pull(skb, thlen)) + goto out; + -+ segs = NULL; -+ if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) -+ goto out; -+ + oldlen = (u16)~skb->len; + __skb_pull(skb, thlen); ++ ++ if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { ++ /* Packet is from an untrusted source, reset gso_segs. */ ++ int mss = skb_shinfo(skb)->gso_size; ++ ++ skb_shinfo(skb)->gso_segs = (skb->len + mss - 1) / mss; ++ ++ segs = NULL; ++ goto out; ++ } + + segs = skb_segment(skb, features); + if (IS_ERR(segs)) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-30 12:48 UTC
[Xen-devel] [3/4] [NET] back: Make extra info slot more generic
Hi: [NET] back: Make extra info slot more generic Based on suggestions by Keir Fraser, this patch makes the extra slot more generic by allowing it to be chained and giving each slot a type field. This makes it easier to add new extra slots such as checksum offset. I''ve also added GSO type constants specific for Xen. For now the conversion function between them and Linux is a noop. When and if they do diverge we can modify them accordingly. The types are now checked in the backend to ensure that they only contain bits that we advertised. We will also advertise the bits as feature-gso instead of feature-tso. This patch also gets rid of gso_segs which is now always recomputed for SKB_GSO_DODGY packets. Last but not least netback now actually sets SKB_GSO_DODGY. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r f6806ad757d5 -r fb922826baef linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jun 30 22:16:02 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jun 30 22:28:09 2006 +1000 @@ -88,8 +88,13 @@ typedef struct netif_st { /* Miscellaneous private stuff. */ enum { DISCONNECTED, DISCONNECTING, CONNECTED } status; int active; + struct list_head list; /* scheduling list */ + atomic_t refcnt; + /* Bit mask of allowed GSO types. */ + int gso_types; + struct net_device *dev; struct net_device_stats stats; diff -r f6806ad757d5 -r fb922826baef linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jun 30 22:16:02 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jun 30 22:28:09 2006 +1000 @@ -663,6 +663,35 @@ static void netbk_fill_frags(struct sk_b } } +int netbk_get_txtras(netif_t *netif, struct netif_tx_extra *txtras, + int work_to_do) +{ + struct netif_tx_extra *txtra; + int i = netif->tx.req_cons; + + do { + + if (unlikely(work_to_do-- <= 0)) { + DPRINTK("Missing extra info\n"); + return -EBADR; + } + + txtra = (struct netif_tx_extra *)RING_GET_REQUEST(&netif->tx, + i); + if (unlikely(!txtra->type || + txtra->type >= XEN_NETIF_TXTRA_TYPE_MAX)) { + netif->tx.req_cons = ++i; + DPRINTK("Invalid extra type: %d\n", txtra->type); + return -EINVAL; + } + + memcpy(txtras + txtra->type - 1, txtra, sizeof(*txtra)); + netif->tx.req_cons = ++i; + } while (txtra->flags & XEN_NETIF_TXTRA_MORE); + + return work_to_do; +} + /* Called after netfront has transmitted */ static void net_tx_action(unsigned long unused) { @@ -670,7 +699,8 @@ static void net_tx_action(unsigned long struct sk_buff *skb; netif_t *netif; netif_tx_request_t txreq; - struct netif_tx_extra txtra; + struct netif_tx_extra txtras[XEN_NETIF_TXTRA_TYPE_MAX - 1]; + struct netif_tx_extra *gso; u16 pending_idx; RING_IDX i; gnttab_map_grant_ref_t *mop; @@ -732,16 +762,15 @@ static void net_tx_action(unsigned long work_to_do--; netif->tx.req_cons = ++i; + memset(txtras, 0, sizeof(txtras)); if (txreq.flags & NETTXF_extra_info) { - if (work_to_do-- <= 0) { - DPRINTK("Missing extra info\n"); - netbk_tx_err(netif, &txreq, i); + work_to_do = netbk_get_txtras(netif, txtras, + work_to_do); + if (unlikely(work_to_do < 0)) { + netbk_tx_err(netif, &txreq, 0); continue; } - - memcpy(&txtra, RING_GET_REQUEST(&netif->tx, i), - sizeof(txtra)); - netif->tx.req_cons = ++i; + i = netif->tx.req_cons; } ret = netbk_count_requests(netif, &txreq, work_to_do); @@ -772,6 +801,29 @@ static void net_tx_action(unsigned long continue; } + gso = NULL; + if (txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1].type) { + unsigned type; + + gso = &txtras[XEN_NETIF_TXTRA_TYPE_GSO - 1]; + if (!gso->gso.size) { + DPRINTK("GSO size must not be zero\n"); + netbk_tx_err(netif, &txreq, i); + continue; + } + + type = xen_gso_type_xen2linux(gso->gso.type); + if (!type || (type & netif->gso_types) != type) { + DPRINTK("Bogus GSO type: 0x%x\n", + gso->gso.type); + netbk_tx_err(netif, &txreq, i); + continue; + } + + /* Whoever gets this needs to verify the header. */ + gso->gso.type = type | SKB_GSO_DODGY; + } + pending_idx = pending_ring[MASK_PEND_IDX(pending_cons)]; data_len = (txreq.size > PKT_PROT_LEN && @@ -788,10 +840,9 @@ static void net_tx_action(unsigned long /* Packets passed to netif_rx() must have some headroom. */ skb_reserve(skb, 16); - if (txreq.flags & NETTXF_gso) { - skb_shinfo(skb)->gso_size = txtra.u.gso.size; - skb_shinfo(skb)->gso_segs = txtra.u.gso.segs; - skb_shinfo(skb)->gso_type = txtra.u.gso.type; + if (gso) { + skb_shinfo(skb)->gso_size = gso->gso.size; + skb_shinfo(skb)->gso_type = gso->gso.type; } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), diff -r f6806ad757d5 -r fb922826baef linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jun 30 22:16:02 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jun 30 22:28:09 2006 +1000 @@ -34,6 +34,9 @@ struct backend_info netif_t *netif; struct xenbus_watch backend_watch; enum xenbus_state frontend_state; + + /* Bit mask of allowed GSO types. */ + int gso_types; }; static int connect_rings(struct backend_info *); @@ -83,6 +86,9 @@ static int netback_probe(struct xenbus_d be->dev = dev; dev->dev.driver_data = be; + /* In future this will be retrieved from Linux. */ + be->gso_types = SKB_GSO_TCPV4 | SKB_GSO_DODGY; + err = xenbus_watch_path2(dev, dev->nodename, "handle", &be->backend_watch, backend_changed); if (err) @@ -101,9 +107,10 @@ static int netback_probe(struct xenbus_d goto abort_transaction; } - err = xenbus_printf(xbt, dev->nodename, "feature-tso", "%d", 1); + err = xenbus_printf(xbt, dev->nodename, "feature-gso", "%d", + be->gso_types); if (err) { - message = "writing feature-tso"; + message = "writing feature-gso"; goto abort_transaction; } @@ -205,6 +212,8 @@ static void backend_changed(struct xenbu xenbus_dev_fatal(dev, err, "creating interface"); return; } + + be->netif->gso_types = be->gso_types; kobject_uevent(&dev->dev.kobj, KOBJ_ONLINE); diff -r f6806ad757d5 -r fb922826baef xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Fri Jun 30 22:16:02 2006 +1000 +++ b/xen/include/public/io/netif.h Fri Jun 30 22:28:09 2006 +1000 @@ -41,12 +41,28 @@ #define _NETTXF_more_data (2) #define NETTXF_more_data (1U<<_NETTXF_more_data) -/* Packet has GSO fields in the following descriptor (netif_tx_extra.u.gso). */ -#define _NETTXF_gso (3) -#define NETTXF_gso (1U<<_NETTXF_gso) +/* Packet to be folloed by extra descritptor. */ +#define _NETTXF_extra_info (3) +#define NETTXF_extra_info (1U<<_NETTXF_extra_info) -/* This descriptor is followed by an extra-info descriptor (netif_tx_extra). */ -#define NETTXF_extra_info (NETTXF_gso) +enum { + XEN_NETIF_TXTRA_TYPE_GSO = 1, + XEN_NETIF_TXTRA_TYPE_MAX, +}; + +enum { + XEN_NETIF_TXTRA_MORE = 1 << 0, +}; + +enum { + /* TCP over IPv4. */ + XEN_GSO_TCPV4 = 1 << 0, + /* UDP over IPv4. */ + XEN_GSO_UDPV4 = 1 << 1, + + /* Packet header must be verified. */ + XEN_GSO_DODGY = 1 << 2, +}; struct netif_tx_request { grant_ref_t gref; /* Reference to buffer page */ @@ -57,16 +73,29 @@ struct netif_tx_request { }; typedef struct netif_tx_request netif_tx_request_t; -/* This structure needs to fit within netif_tx_request for compatibility. */ +/* + * This structure needs to fit within both netif_tx_request and + * netif_rx_response for compatibility. + */ struct netif_tx_extra { + /* Type of extra info. */ + uint8_t type; + /* Flags for this info. */ + uint8_t flags; + union { - /* NETTXF_gso: Generic Segmentation Offload. */ - struct netif_tx_gso { - uint16_t size; /* GSO MSS. */ - uint16_t segs; /* GSO segment count. */ - uint16_t type; /* GSO type. */ - } gso; - } u; + /* + * Maximum payload size of each segment. For example, for TCP this is + * just the path MSS. + */ + uint16_t size; + + /* + * GSO type. This determines the protocol of the packet and any extra + * features required to segment the packet properly. + */ + uint16_t type; + } gso; }; struct netif_tx_response { @@ -110,6 +139,17 @@ DEFINE_RING_TYPES(netif_rx, struct netif /* No response: used for auxiliary requests (e.g., netif_tx_extra). */ #define NETIF_RSP_NULL 1 +/* For now the GSO types are identical. */ +static inline int xen_gso_type_linux2xen(int type) +{ + return type; +} + +static inline int xen_gso_type_xen2linux(int type) +{ + return type; +} + #endif /* _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-30 12:49 UTC
[Xen-devel] [4/4] [NET] front: Transmit TSO packets if supported
Hi: [NET] front: Transmit TSO packets if supported This patch adds TSO transmission support to the frontend. This also fixes a bug where SG may not be turned off correctly when migrating to an old host. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r fb922826baef -r ea33ffdb9973 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jun 30 22:28:09 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jun 30 22:44:18 2006 +1000 @@ -463,7 +463,7 @@ static int network_open(struct net_devic static inline int netfront_tx_slot_available(struct netfront_info *np) { - return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 1; + return RING_FREE_REQUESTS(&np->tx) >= MAX_SKB_FRAGS + 2; } static inline void network_maybe_wake_tx(struct net_device *dev) @@ -491,7 +491,13 @@ static void network_tx_buf_gc(struct net rmb(); /* Ensure we see responses up to ''rp''. */ for (cons = np->tx.rsp_cons; cons != prod; cons++) { - id = RING_GET_RESPONSE(&np->tx, cons)->id; + struct netif_tx_response *txrsp; + + txrsp = RING_GET_RESPONSE(&np->tx, cons); + if (txrsp->status == NETIF_RSP_NULL) + continue; + + id = txrsp->id; skb = np->tx_skbs[id]; if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { @@ -719,6 +725,7 @@ static int network_start_xmit(struct sk_ unsigned short id; struct netfront_info *np = netdev_priv(dev); struct netif_tx_request *tx; + struct netif_tx_extra *txtra; char *data = skb->data; RING_IDX i; grant_ref_t ref; @@ -739,7 +746,8 @@ static int network_start_xmit(struct sk_ spin_lock_irq(&np->tx_lock); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)))) { + (frags > 1 && !xennet_can_sg(dev)) || + netif_needs_gso(dev, skb))) { spin_unlock_irq(&np->tx_lock); goto drop; } @@ -762,10 +770,30 @@ static int network_start_xmit(struct sk_ tx->size = len; tx->flags = 0; + txtra = NULL; + if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ tx->flags |= NETTXF_csum_blank | NETTXF_data_validated; if (skb->proto_data_valid) /* remote but checksummed? */ tx->flags |= NETTXF_data_validated; + + if (skb_shinfo(skb)->gso_size) { + struct netif_tx_extra *gso + (struct netif_tx_extra *)RING_GET_REQUEST(&np->tx, ++i); + + if (txtra) + txtra->flags |= XEN_NETIF_TXTRA_MORE; + else + tx->flags |= NETTXF_extra_info; + + gso->gso.size = skb_shinfo(skb)->gso_size; + gso->gso.type + xen_gso_type_linux2xen(skb_shinfo(skb)->gso_type); + + gso->type = XEN_NETIF_TXTRA_TYPE_GSO; + gso->flags = 0; + txtra = gso; + } np->tx.req_prod_pvt = i + 1; @@ -1065,9 +1093,47 @@ static int xennet_set_sg(struct net_devi return ethtool_op_set_sg(dev, data); } +static int xennet_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + struct netfront_info *np = netdev_priv(dev); + int val; + + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-gso", + "%d", &val) < 0) + val = 0; + if (!(val & XEN_GSO_TCPV4)) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + +static void xennet_set_gso(struct net_device *dev) +{ + struct netfront_info *np = netdev_priv(dev); + int allowed_types = SKB_GSO_TCPV4 | SKB_GSO_DODGY; + int val; + + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-gso", + "%d", &val) < 0) + return; + + /* Filter out things we don''t understand. */ + val = xen_gso_type_xen2linux(val) & allowed_types; + + /* Turn on the advertised bits. */ + dev->features |= val << NETIF_F_GSO_SHIFT; +} + static void xennet_set_features(struct net_device *dev) { - xennet_set_sg(dev, 1); + /* Turn off all negotiated bits. */ + dev->features &= (1 << NETIF_F_GSO_SHIFT) - 1; + xennet_set_sg(dev, 0); + + if (!xennet_set_sg(dev, 1)) + xennet_set_gso(dev); } static void network_connect(struct net_device *dev) @@ -1148,6 +1214,8 @@ static struct ethtool_ops network_ethtoo .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = xennet_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = xennet_set_tso, }; #ifdef CONFIG_SYSFS _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jun-30 13:07 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On 30 Jun 2006, at 13:47, Herbert Xu wrote:> The net-tso patch has been merged upstream. I''ve also changed the > feature-tso interface to be a bit mask of the XEN gso_types bits. > It''s now called feature-gso. This means we won''t have to add one > feature for each protocol. > > So here is a repost of the entire series.I''ve merged all this already, with a few changes. I''ve also disabled netback from advertising the feature, and also netfront from using it, until we''ve all agreed that the inter-domain bits are sane. These should end up in the public tree in an hour or two. The changes: 1. Pushed the gso fields into a struct inside the union. Otherwise the fields overlap. 2. Changed the GSO type definitions. Currently only one type (TCPv4) and the protocol type isn''t really a bitmask since they are mutually exclusive for a given packet. Also ''dodgy'' makes no sense since netback doesn''t trust netfront anyway. 3. Renamed TXTRA->EXTRA and tx_extra -> extra_info. Looks like you want to share the struct with the rx patch at some point, so making it tx-specific now makes no sense. If that''s not the case we can rename back again. 4. I''m not sure all the error paths are now correct in netback. For example, there''s a call to netbk_tx_err with an end index of 0. Is that right? In the latest changes I''d rather have feature-gso list the supported protocols as strings (tcpv4,udpv4,etc). Also, what happens if netfront does the following bad things: 1. gso.type doesn''t actually match the protocol type? 2. gso.size is set to a really small value (so that you make lots of packets)? Do we need more handling of these cases in netback? Will these be safely handled in the network stack? Might we need to always work out gso.type in netback for safety? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jun-30 13:21 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On Fri, Jun 30, 2006 at 02:07:16PM +0100, Keir Fraser wrote:> > I''ve merged all this already, with a few changes. I''ve also disabled > netback from advertising the feature, and also netfront from using it, > until we''ve all agreed that the inter-domain bits are sane. These > should end up in the public tree in an hour or two.Thanks.> The changes: > 1. Pushed the gso fields into a struct inside the union. Otherwise the > fields overlap.Doh! Thanks for the correction.> 2. Changed the GSO type definitions. Currently only one type (TCPv4) > and the protocol type isn''t really a bitmask since they are mutually > exclusive for a given packet. Also ''dodgy'' makes no sense since netback > doesn''t trust netfront anyway.Agreed wrt ''dodgy'' bit. However, this really needs to be a bit mask because we''ll have things like ECN and other protocol-specific bits in future. In fact the upstream Linux tree has an ECN bit now. The exclusivity will be checked by Linux (I''ve already submitted patches to do this).> 3. Renamed TXTRA->EXTRA and tx_extra -> extra_info. Looks like you > want to share the struct with the rx patch at some point, so making it > tx-specific now makes no sense. If that''s not the case we can rename > back again.Yes that''s the plan.> 4. I''m not sure all the error paths are now correct in netback. For > example, there''s a call to netbk_tx_err with an end index of 0. Is that > right?That was deliberate as 0 is the smallest RING_IDX. However, now that I look at it again there is an off-by-one bug. I''ll fix that tomorrow.> In the latest changes I''d rather have feature-gso list the supported > protocols as strings (tcpv4,udpv4,etc).Well then I might as well go back to the one int per-bit thing with ''feature-tso'', ''feature-ufo'', etc. It''s much easier than parsing strings.> Also, what happens if netfront does the following bad things: > 1. gso.type doesn''t actually match the protocol type?This is checked by Linux due to the ''dodgy'' bit. The code isn''t in the net-gso patch yet because for now it only has one protocol. The upstream code should have it tomorrow as we''re about to add TSO6.> 2. gso.size is set to a really small value (so that you make lots of > packets)?That''s OK. TCP must deal with an MSS of 1 anyway. The same applies to the other protocols.> Do we need more handling of these cases in netback? Will these be > safely handled in the network stack? Might we need to always work out > gso.type in netback for safety?I think we''ve got all the bits covered now. But if you can think of anything else please let me know. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jun-30 14:36 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On 30 Jun 2006, at 14:21, Herbert Xu wrote:> However, this really needs to be a bit mask because we''ll have things > like ECN and other protocol-specific bits in future. In fact the > upstream Linux tree has an ECN bit now. > > The exclusivity will be checked by Linux (I''ve already submitted > patches > to do this).I''m not at all bothered about having the type format the same as in Linux. How about we split the type into protocol (being a proper enumeration) and proto_flags (being a protocol-specific bitmask)? Might there be any non-proto-specific flags in future?>> In the latest changes I''d rather have feature-gso list the supported >> protocols as strings (tcpv4,udpv4,etc). > > Well then I might as well go back to the one int per-bit thing with > ''feature-tso'', ''feature-ufo'', etc. It''s much easier than parsing > strings.I''m not too bothered either way, but I personally prefer having the more properly qualified names listed under feature-gso. Pulling it apart with strstr() in netfront (for each proto that netfront can deal with) wouldn''t be hard.>> Also, what happens if netfront does the following bad things: >> 1. gso.type doesn''t actually match the protocol type? > > This is checked by Linux due to the ''dodgy'' bit. The code isn''t in > the net-gso patch yet because for now it only has one protocol. > The upstream code should have it tomorrow as we''re about to add TSO6.Do we then need the ''type'' at all? What is it actually used for -- I''d assume the network stack would demux to the correct protocol code as it would for any ordinary packet, so why does it need help with the protocol for GSO packets? Thanks! Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-01 03:26 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On Fri, Jun 30, 2006 at 03:36:49PM +0100, Keir Fraser wrote:> > I''m not too bothered either way, but I personally prefer having the > more properly qualified names listed under feature-gso. Pulling it > apart with strstr() in netfront (for each proto that netfront can deal > with) wouldn''t be hard.I can call it feature-gso-tcpv4, feature-gso-udpv4, etc.> Do we then need the ''type'' at all? What is it actually used for -- I''d > assume the network stack would demux to the correct protocol code as it > would for any ordinary packet, so why does it need help with the > protocol for GSO packets?Good point. I''ll get rid of it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-01 03:33 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On Sat, Jul 01, 2006 at 01:26:09PM +1000, herbert wrote:> > > Do we then need the ''type'' at all? What is it actually used for -- I''d > > assume the network stack would demux to the correct protocol code as it > > would for any ordinary packet, so why does it need help with the > > protocol for GSO packets? > > Good point. I''ll get rid of it.Actually, we do need it for two reasons: 1. To indicate protocol for drivers that can cope with malformed packets. The header verification will be skipped for such drivers. 2. To carry extra flags such as ECN that cannot harm the host if set incorrectly. Given that Linux will cope with malformed headers or a bogus gso_type, I''d really like to keep the type value uniform between Linux and Xen. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-01 08:17 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On 1 Jul 2006, at 04:26, Herbert Xu wrote:>> I''m not too bothered either way, but I personally prefer having the >> more properly qualified names listed under feature-gso. Pulling it >> apart with strstr() in netfront (for each proto that netfront can deal >> with) wouldn''t be hard. > > I can call it feature-gso-tcpv4, feature-gso-udpv4, etc.Yes, I like that. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-01 08:24 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On 1 Jul 2006, at 04:33, Herbert Xu wrote:>> Good point. I''ll get rid of it. > > Actually, we do need it for two reasons: > > 1. To indicate protocol for drivers that can cope with malformed > packets. > The header verification will be skipped for such drivers. > 2. To carry extra flags such as ECN that cannot harm the host if set > incorrectly.Fair enough, that makes sense.> Given that Linux will cope with malformed headers or a bogus gso_type, > I''d > really like to keep the type value uniform between Linux and Xen.I''m uncomfortable with this, even though it makes things a little easier now. For sanity I want to see netfront/netback explicitly grok flags rather than dumbly pass them through. I''d prefer uint8_t protocol and uint8_t flags. Former is a protocol enumeration; latter is unused now but we can add ECN and so on later. By the way: will we need netback to advertise support for the ECN flag? I''m not sure exactly what it will mean, and whether it can just be ignored by netbacks that don''t support it? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-01 09:59 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On Sat, Jul 01, 2006 at 09:24:13AM +0100, Keir Fraser wrote:> > I''m uncomfortable with this, even though it makes things a little > easier now. For sanity I want to see netfront/netback explicitly grok > flags rather than dumbly pass them through. I''d prefer uint8_t protocol > and uint8_t flags. Former is a protocol enumeration; latter is unused > now but we can add ECN and so on later. By the way: will we needOK.> netback to advertise support for the ECN flag? I''m not sure exactly > what it will mean, and whether it can just be ignored by netbacks that > don''t support it?If netback does not advertise the flag then netfront will perform segmentation before passing TCP packets with CWR set through. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-01 12:17 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On 1 Jul 2006, at 10:59, Herbert Xu wrote:>> netback to advertise support for the ECN flag? I''m not sure exactly >> what it will mean, and whether it can just be ignored by netbacks that >> don''t support it? > > If netback does not advertise the flag then netfront will perform > segmentation before passing TCP packets with CWR set through.Ah, okay, can that not be handled properly by some TSO-supporting NICs then? You need to do something smarter than simply duplicate the ECN bits in every segment header? What would happen if netfront sent through a packet with ECE/CWR and didn''t set the GSO_ECN flag -- is it safe but stupid, will it break something, should dodgy packets have the GSO_ECN flag recomputed? Thanks! Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-01 12:38 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix maximum fragment check
On Sat, Jul 01, 2006 at 01:17:31PM +0100, Keir Fraser wrote:> > Ah, okay, can that not be handled properly by some TSO-supporting NICs > then? You need to do something smarter than simply duplicate the ECN > bits in every segment header? What would happen if netfront sent > through a packet with ECE/CWR and didn''t set the GSO_ECN flag -- is it > safe but stupid, will it break something, should dodgy packets have the > GSO_ECN flag recomputed?Getting the ECN bit wrong is OK. The worst result is that the packets generated will have incorrect CWR markings which will only penalise the connection belonging to the guest. If the guest wanted to achieve that, it can always do the same thing by segmenting the packets itself anyway. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-03 04:44 UTC
[Xen-devel] [1/4] [NET] back: Fix off-by-one error in netbk_tx_err
Hi Keir: Here are the GSO changes again which should address your concerns. Let me know if you have any other problems. [NET] back: Fix off-by-one error in netbk_tx_err The generalised extra request info patch introduced a bug with the use of netbk_tx_err since it advanced the req_cons pointer by one. This patch fixes thing by delaying the increment in netbk_tx_err. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3fe11185adfb -r 3656a2985ae1 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Sat Jul 01 09:37:24 2006 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Mon Jul 03 14:18:54 2006 +1000 @@ -496,9 +496,9 @@ static void netbk_tx_err(netif_t *netif, do { make_tx_response(netif, txp, NETIF_RSP_ERROR); - if (++cons >= end) + if (cons >= end) break; - txp = RING_GET_REQUEST(&netif->tx, cons); + txp = RING_GET_REQUEST(&netif->tx, cons++); } while (1); netif->tx.req_cons = cons; netif_schedule_work(netif); @@ -764,11 +764,11 @@ static void net_tx_action(unsigned long if (txreq.flags & NETTXF_extra_info) { work_to_do = netbk_get_extras(netif, extras, work_to_do); + i = netif->tx.req_cons; if (unlikely(work_to_do < 0)) { - netbk_tx_err(netif, &txreq, 0); + netbk_tx_err(netif, &txreq, i); continue; } - i = netif->tx.req_cons; } ret = netbk_count_requests(netif, &txreq, work_to_do); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-03 04:45 UTC
[Xen-devel] [2/4] [NET] back: Add GSO features field and check gso_size
Hi: [NET] back: Add GSO features field and check gso_size This patch adds the as-yet unused GSO features which will contain protocol-independent bits such as the ECN marker. It also makes the backend check gso_size to ensure that it is non-zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3656a2985ae1 -r 8c37d0d4526e linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Mon Jul 03 14:18:54 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Mon Jul 03 14:31:15 2006 +1000 @@ -691,6 +691,29 @@ int netbk_get_extras(netif_t *netif, str return work_to_do; } +static int netbk_set_skb_gso(struct sk_buff *skb, struct netif_extra_info *gso) +{ + if (!gso->u.gso.size) { + DPRINTK("GSO size must not be zero.\n"); + return -EINVAL; + } + + /* Currently on TCPv4 S.O. is supported. */ + if (gso->u.gso.type != XEN_NETIF_GSO_TCPV4) { + DPRINTK("Bad GSO type %d.\n", gso->u.gso.type); + return -EINVAL; + } + + skb_shinfo(skb)->gso_size = gso->u.gso.size; + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + + /* Header must be checked, and gso_segs computed. */ + skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb)->gso_segs = 0; + + return 0; +} + /* Called after netfront has transmitted */ static void net_tx_action(unsigned long unused) { @@ -819,20 +842,11 @@ static void net_tx_action(unsigned long struct netif_extra_info *gso; gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1]; - /* Currently on TCPv4 S.O. is supported. */ - if (gso->u.gso.type != XEN_NETIF_GSO_TCPV4) { - DPRINTK("Bad GSO type %d.\n", gso->u.gso.type); + if (netbk_set_skb_gso(skb, gso)) { kfree_skb(skb); netbk_tx_err(netif, &txreq, i); - break; + continue; } - - skb_shinfo(skb)->gso_size = gso->u.gso.size; - skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; - - /* Header must be checked, and gso_segs computed. */ - skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY; - skb_shinfo(skb)->gso_segs = 0; } gnttab_set_map_op(mop, MMAP_VADDR(pending_idx), diff -r 3656a2985ae1 -r 8c37d0d4526e xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Mon Jul 03 14:18:54 2006 +1000 +++ b/xen/include/public/io/netif.h Mon Jul 03 14:31:15 2006 +1000 @@ -88,6 +88,12 @@ struct netif_extra_info { * extra features required to segment the packet properly. */ uint16_t type; /* XEN_NETIF_GSO_* */ + + /* + * GSO features . This specifies any extra GSO features required + * to process this packet, such as ECN support for TCPv4. + */ + uint16_t features; /* XEN_NETIF_FEAT_* */ } gso; uint16_t pad[3]; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-03 04:46 UTC
[Xen-devel] [3/4] [NET]: Rename feature-tso to feature-gso-tcpv4
Hi: [NET]: Rename feature-tso to feature-gso-tcpv4 This patch renames the name feature-tso to feature-gso-tcpv4 for future expansion. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 8c37d0d4526e -r 8c5fd9867b3c linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Mon Jul 03 14:31:15 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Mon Jul 03 14:35:47 2006 +1000 @@ -102,9 +102,10 @@ static int netback_probe(struct xenbus_d } #if 0 /* KAF: After the protocol is finalised. */ - err = xenbus_printf(xbt, dev->nodename, "feature-tso", "%d", 1); + err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4", + "%d", 1); if (err) { - message = "writing feature-tso"; + message = "writing feature-gso-tcpv4"; goto abort_transaction; } #endif diff -r 8c37d0d4526e -r 8c5fd9867b3c linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Jul 03 14:31:15 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Jul 03 14:35:47 2006 +1000 @@ -1098,8 +1098,8 @@ static int xennet_set_tso(struct net_dev struct netfront_info *np = netdev_priv(dev); int val; - if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-tso", - "%d", &val) < 0) + if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, + "feature-gso-tcpv4", "%d", &val) < 0) val = 0; #if 0 /* KAF: After the protocol is finalised. */ if (!val) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-03 04:46 UTC
[Xen-devel] [4/4] [NET] front: Zero negotiated bits in xen_set_features
Hi: [NET] front: Zero negotiated bits in xen_set_features When we reconnect to the backend we need to first zero all negotiated bits as the functions xen_set_sg and xen_set_tso do not (and are not supposed to) zero bits when they fail to set them. This patch also permanently enables the NETIF_F_GSO_ROBUST bit as we never parse any GSO fields ourselves (even if we did the backend could not trust us so it''s wasted effort). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 8c5fd9867b3c -r 531033849420 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Jul 03 14:35:47 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Jul 03 14:42:09 2006 +1000 @@ -1112,6 +1112,11 @@ static int xennet_set_tso(struct net_dev static void xennet_set_features(struct net_device *dev) { + /* Turn off all GSO bits except ROBUST. */ + dev->features &= (1 << NETIF_F_GSO_SHIFT) - 1; + dev->features |= NETIF_F_GSO_ROBUST; + xennet_set_sg(dev, 0); + if (!xennet_set_sg(dev, 1)) xennet_set_tso(dev, 1); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-03 08:12 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix off-by-one error in netbk_tx_err
On 3 Jul 2006, at 05:44, Herbert Xu wrote:> Here are the GSO changes again which should address your concerns. > Let me > know if you have any other problems.I wonder if the ''type'' field would be better named ''protocol'' now? ''Type'' is nicely vague though. Also I changed it to uint8_t since it''s an enumeration -- should be plenty big enough and leaves us with 8 bits spare in case that''s useful in future. Does that seem okay? Apart from those questions I checked all your patches in. Once they pass our regression tests, and you give the okay, I''ll enable the feature negotiation. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-03 08:14 UTC
[Xen-devel] Re: [1/4] [NET] back: Fix off-by-one error in netbk_tx_err
On Mon, Jul 03, 2006 at 09:12:32AM +0100, Keir Fraser wrote:> > I wonder if the ''type'' field would be better named ''protocol'' now? > ''Type'' is nicely vague though. Also I changed it to uint8_t since it''s > an enumeration -- should be plenty big enough and leaves us with 8 bits > spare in case that''s useful in future. Does that seem okay?Sure that''s fine by me.> Apart from those questions I checked all your patches in. Once they > pass our regression tests, and you give the okay, I''ll enable the > feature negotiation.Great! Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel