Hi Keir: This is a respin of the dom0=>domU TSO patch series. Original description: This series of patches adds SG and TSO support for the dom0=>domU direction. This completes the domU=>domU loop which means that we have TSO all the way through. It''s a bit late here so I haven''t done a full set of benchmarks. However, these two figures should be indicative for domU=>domU: TSO off: 323.91Mb/s TSO on: 1678.52Mb/s lo(16436): 5533.93Mb/s Of course you''ll need to turn the 2 #if''s back on for domU=>dom0 to do this. I''ve also found out the reason I was getting smaller MSSes before. The default write buffer size is too small for these speeds in Linux. So you need to raise it by something like: echo 526808 > /proc/sys/net/core/wmem_max netperf -t TCP_STREAM -H x.x.x.x -- -s 263404 Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Stop using rx->id With the current protocol for transferring packets from dom0 to domU, the rx->id field is useless because it can be derived from the rx request ring ID. In particular, rx->id = (ring_id & NET_RX_RING_SIZE - 1) + 1; This formula works because the rx response to each request always occupies the same slot that the request arrived in. This in turn is a consequence of the fact that each packet only occupies one slot. The other important reason that this works for dom0=>domU but not domU=>dom0 is that the resource associated with the rx->id is freed immediately while in the domU=>dom0 case the resource is held until the skb is liberated by dom0. Using this formula we can essentially remove rx->id from the protocol, freeing up space that could be instead be used by things like TSO. The only constraint is that the backend must obey the rule that each id must only be used in the response that occupies the same slot as the request. The actual field of rx->id is still maintained for compatibility with older backends. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 5848356af8da -r 65ff65fb76ec linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Thu Jul 27 14:06:15 2006 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 17:51:16 2006 +1000 @@ -99,17 +99,17 @@ struct netfront_info { struct timer_list rx_refill_timer; /* - * {tx,rx}_skbs store outstanding skbuffs. The first entry in each - * array is an index into a chain of free entries. + * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs + * is an index into a chain of free entries. */ struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; - struct sk_buff *rx_skbs[NET_RX_RING_SIZE+1]; + struct sk_buff *rx_skbs[NET_RX_RING_SIZE]; #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; grant_ref_t gref_rx_head; - grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1]; + grant_ref_t grant_rx_ref[NET_TX_RING_SIZE]; struct xenbus_device *xbdev; int tx_ring_ref; @@ -122,7 +122,7 @@ struct netfront_info { }; /* - * Access macros for acquiring freeing slots in {tx,rx}_skbs[]. + * Access macros for acquiring freeing slots in tx_skbs[]. */ static inline void add_id_to_freelist(struct sk_buff **list, unsigned short id) @@ -136,6 +136,29 @@ static inline unsigned short get_id_from unsigned int id = (unsigned int)(unsigned long)list[0]; list[0] = list[id]; return id; +} + +static inline int xennet_rxidx(RING_IDX idx) +{ + return idx & (NET_RX_RING_SIZE - 1); +} + +static inline struct sk_buff *xennet_get_rx_skb(struct netfront_info *np, + RING_IDX ri) +{ + int i = xennet_rxidx(ri); + struct sk_buff *skb = np->rx_skbs[i]; + np->rx_skbs[i] = NULL; + return skb; +} + +static inline grant_ref_t xennet_get_rx_ref(struct netfront_info *np, + RING_IDX ri) +{ + int i = xennet_rxidx(ri); + grant_ref_t ref = np->grant_rx_ref[i]; + np->grant_rx_ref[i] = GRANT_INVALID_REF; + return ref; } #define DPRINTK(fmt, args...) \ @@ -598,8 +621,9 @@ static void network_alloc_rx_buffers(str skb->dev = dev; - id = get_id_from_freelist(np->rx_skbs); - + id = xennet_rxidx(req_prod + i); + + BUG_ON(np->rx_skbs[id]); np->rx_skbs[id] = skb; RING_GET_REQUEST(&np->rx, req_prod + i)->id = id; @@ -840,6 +864,19 @@ static irqreturn_t netif_int(int irq, vo return IRQ_HANDLED; } +static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb, + grant_ref_t ref) +{ + int new = xennet_rxidx(np->rx.req_prod_pvt); + + BUG_ON(np->rx_skbs[new]); + np->rx_skbs[new] = skb; + np->grant_rx_ref[new] = ref; + RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new; + RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref; + np->rx.req_prod_pvt++; + RING_PUSH_REQUESTS(&np->rx); +} static int netif_poll(struct net_device *dev, int *pbudget) { @@ -874,12 +911,15 @@ static int netif_poll(struct net_device i++, work_done++) { rx = RING_GET_RESPONSE(&np->rx, i); + skb = xennet_get_rx_skb(np, i); + ref = xennet_get_rx_ref(np, i); + /* * This definitely indicates a bug, either in this driver or in * the backend driver. In future this should flag the bad * situation to the system controller to reboot the backed. */ - if ((ref = np->grant_rx_ref[rx->id]) == GRANT_INVALID_REF) { + if (ref == GRANT_INVALID_REF) { WPRINTK("Bad rx response id %d.\n", rx->id); work_done--; continue; @@ -890,21 +930,12 @@ static int netif_poll(struct net_device if (net_ratelimit()) WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n", rx->id, rx->status); - RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id - rx->id; - RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref - ref; - np->rx.req_prod_pvt++; - RING_PUSH_REQUESTS(&np->rx); + xennet_move_rx_slot(np, skb, ref); work_done--; continue; } gnttab_release_grant_reference(&np->gref_rx_head, ref); - np->grant_rx_ref[rx->id] = GRANT_INVALID_REF; - - skb = np->rx_skbs[rx->id]; - add_id_to_freelist(np->rx_skbs, rx->id); /* NB. We handle skb overflow later. */ skb->data = skb->head + rx->offset; @@ -1158,15 +1189,23 @@ static void network_connect(struct net_d } /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ - for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) { - if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET) + for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) { + if (!np->rx_skbs[i]) continue; + gnttab_grant_foreign_transfer_ref( np->grant_rx_ref[i], np->xbdev->otherend_id, __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT); RING_GET_REQUEST(&np->rx, requeue_idx)->gref np->grant_rx_ref[i]; - RING_GET_REQUEST(&np->rx, requeue_idx)->id = i; + RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx; + + if (requeue_idx < i) { + np->rx_skbs[requeue_idx] = np->rx_skbs[i]; + np->grant_rx_ref[requeue_idx] = np->grant_rx_ref[i]; + np->rx_skbs[i] = NULL; + np->grant_rx_ref[i] = GRANT_INVALID_REF; + } requeue_idx++; } @@ -1392,8 +1431,8 @@ static struct net_device * __devinit cre np->grant_tx_ref[i] = GRANT_INVALID_REF; } - for (i = 0; i <= NET_RX_RING_SIZE; i++) { - np->rx_skbs[i] = (void *)((unsigned long) i+1); + for (i = 0; i < NET_RX_RING_SIZE; i++) { + np->rx_skbs[i] = NULL; np->grant_rx_ref[i] = GRANT_INVALID_REF; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-28 08:43 UTC
[Xen-devel] [2/9] [NET] front: Move rx req pushing to one spot
Hi: [NET] front: Move rx req pushing to one spot This patch moves all rx request pushing to network_alloc_rx_buffers. This is needed to reduce churn for TSO. More importantly, this makes it easier to send notifications when adding rx requests which is required for having a queue in dom0. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 65ff65fb76ec -r 7bf12da8a6ed linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 17:51:16 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 17:51:32 2006 +1000 @@ -600,14 +600,17 @@ static void network_alloc_rx_buffers(str /* Could not allocate any skbuffs. Try again later. */ mod_timer(&np->rx_refill_timer, jiffies + (HZ/10)); - return; + break; } __skb_queue_tail(&np->rx_batch, skb); } /* Is the batch large enough to be worthwhile? */ - if (i < (np->rx_target/2)) + if (i < (np->rx_target/2)) { + if (req_prod > np->rx.sring->req_prod) + goto push; return; + } /* Adjust our fill target if we risked running out of buffers. */ if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4)) && @@ -678,6 +681,7 @@ static void network_alloc_rx_buffers(str /* Above is a suitable barrier to ensure backend will see requests. */ np->rx.req_prod_pvt = req_prod + i; +push: RING_PUSH_REQUESTS(&np->rx); } @@ -875,7 +879,6 @@ static void xennet_move_rx_slot(struct n RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new; RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref; np->rx.req_prod_pvt++; - RING_PUSH_REQUESTS(&np->rx); } static int netif_poll(struct net_device *dev, int *pbudget) @@ -1210,7 +1213,6 @@ static void network_connect(struct net_d } np->rx.req_prod_pvt = requeue_idx; - RING_PUSH_REQUESTS(&np->rx); /* * Step 3: All public and private state should now be sane. Get _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-28 08:44 UTC
[Xen-devel] [3/9] [NET] back: Replace netif->status with netif_carrier_ok
Hi: [NET] back: Replace netif->status with netif_carrier_ok The connection status to the frontend can be represented using netif_carrier_ok instead of netif->status. As a result, we delay the construction of the dev qdisc until the carrier comes on. This is a prerequisite for adding a tx queue. By the same token, netif->active is now simply the conjunction of netif_running and netif_carrier_ok so it too can be removed. Because netif_carrier_off/netif_carrier_on and rtnl_lock all entail memory barriers, there is no need to have extra memory barriers around them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 7bf12da8a6ed -r c42c1f4bd148 linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 28 17:51:32 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 28 18:11:09 2006 +1000 @@ -86,8 +86,6 @@ typedef struct netif_st { struct timer_list credit_timeout; /* Miscellaneous private stuff. */ - enum { DISCONNECTED, DISCONNECTING, CONNECTED } status; - int active; struct list_head list; /* scheduling list */ atomic_t refcnt; struct net_device *dev; diff -r 7bf12da8a6ed -r c42c1f4bd148 linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 17:51:32 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:11:09 2006 +1000 @@ -36,28 +36,20 @@ static void __netif_up(netif_t *netif) { - struct net_device *dev = netif->dev; - netif_tx_lock_bh(dev); - netif->active = 1; - netif_tx_unlock_bh(dev); enable_irq(netif->irq); netif_schedule_work(netif); } static void __netif_down(netif_t *netif) { - struct net_device *dev = netif->dev; disable_irq(netif->irq); - netif_tx_lock_bh(dev); - netif->active = 0; - netif_tx_unlock_bh(dev); netif_deschedule_work(netif); } static int net_open(struct net_device *dev) { netif_t *netif = netdev_priv(dev); - if (netif->status == CONNECTED) + if (netif_carrier_ok(dev)) __netif_up(netif); netif_start_queue(dev); return 0; @@ -67,7 +59,7 @@ static int net_close(struct net_device * { netif_t *netif = netdev_priv(dev); netif_stop_queue(dev); - if (netif->status == CONNECTED) + if (netif_carrier_ok(dev)) __netif_down(netif); return 0; } @@ -93,11 +85,12 @@ netif_t *netif_alloc(domid_t domid, unsi return ERR_PTR(-ENOMEM); } + netif_carrier_off(dev); + netif = netdev_priv(dev); memset(netif, 0, sizeof(*netif)); netif->domid = domid; netif->handle = handle; - netif->status = DISCONNECTED; atomic_set(&netif->refcnt, 1); init_waitqueue_head(&netif->waiting_to_free); netif->dev = dev; @@ -256,11 +249,9 @@ int netif_map(netif_t *netif, unsigned l netif->rx_req_cons_peek = 0; netif_get(netif); - wmb(); /* Other CPUs see new state before interface is started. */ rtnl_lock(); - netif->status = CONNECTED; - wmb(); + netif_carrier_on(netif->dev); if (netif_running(netif->dev)) __netif_up(netif); rtnl_unlock(); @@ -296,20 +287,13 @@ static void netif_free(netif_t *netif) void netif_disconnect(netif_t *netif) { - switch (netif->status) { - case CONNECTED: + if (netif_carrier_ok(netif->dev)) { rtnl_lock(); - netif->status = DISCONNECTING; - wmb(); + netif_carrier_off(netif->dev); if (netif_running(netif->dev)) __netif_down(netif); rtnl_unlock(); netif_put(netif); - /* fall through */ - case DISCONNECTED: - netif_free(netif); - break; - default: - BUG(); - } -} + } + netif_free(netif); +} diff -r 7bf12da8a6ed -r c42c1f4bd148 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 17:51:32 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:11:09 2006 +1000 @@ -143,7 +143,7 @@ int netif_be_start_xmit(struct sk_buff * BUG_ON(skb->dev != dev); /* Drop the packet if the target domain has no receive buffers. */ - if (!netif->active || + if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)) || (netif->rx_req_cons_peek == netif->rx.sring->req_prod) || ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) = NET_RX_RING_SIZE)) @@ -404,7 +404,9 @@ static void add_to_net_schedule_list_tai return; spin_lock_irq(&net_schedule_list_lock); - if (!__on_net_schedule_list(netif) && netif->active) { + if (!__on_net_schedule_list(netif) && + likely(netif_running(netif->dev) && + netif_carrier_ok(netif->dev))) { list_add_tail(&netif->list, &net_schedule_list); netif_get(netif); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Added feature-rx-notify This patch adds support to the frontend for notifying the backend whenever the rx ring is refilled. This is required in order for the backend to get a tx queue. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r c42c1f4bd148 -r b41da4470aa6 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 18:11:09 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 18:11:21 2006 +1000 @@ -326,6 +326,12 @@ again: goto abort_transaction; } + err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify", "%d", 1); + if (err) { + message = "writing feature-rx-notify"; + goto abort_transaction; + } + err = xenbus_transaction_end(xbt, 0); if (err) { if (err == -EAGAIN) @@ -683,6 +689,7 @@ static void network_alloc_rx_buffers(str np->rx.req_prod_pvt = req_prod + i; push: RING_PUSH_REQUESTS(&np->rx); + notify_remote_via_irq(np->irq); } static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, @@ -1221,7 +1228,6 @@ static void network_connect(struct net_d * packets. */ netif_carrier_on(dev); - notify_remote_via_irq(np->irq); network_tx_buf_gc(dev); network_alloc_rx_buffers(dev); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] back: Added tx queue This patch adds a tx queue to the backend if the frontend supports rx refill notification. A queue is needed because SG/TSO greatly reduces the number of packets that can be stored in the rx ring. Given an rx ring with 256 entries, a maximum TSO packet can occupy as many as 18 entries, meaning that the entire ring can only hold 14 packets. This is too small at high bandwidths with large TCP RX windows. Having a tx queue does not present a new security risk as the queue is a fixed size buffer just like the rx ring. So each guest can only hold a fixed amount of memory (proportional to the tx queue length) on the host. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r b41da4470aa6 -r 29673e3aab05 linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 28 18:11:21 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 28 18:18:51 2006 +1000 @@ -76,6 +76,10 @@ typedef struct netif_st { struct vm_struct *tx_comms_area; struct vm_struct *rx_comms_area; + /* Set of features that can be turned on in dev->features. */ + int features; + int can_queue; + /* Allow netif_be_start_xmit() to peek ahead in the rx request ring. */ RING_IDX rx_req_cons_peek; @@ -119,4 +123,10 @@ struct net_device_stats *netif_be_get_st struct net_device_stats *netif_be_get_stats(struct net_device *dev); irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs); +static inline int netbk_can_queue(struct net_device *dev) +{ + netif_t *netif = netdev_priv(dev); + return netif->can_queue; +} + #endif /* __NETIF__BACKEND__COMMON_H__ */ diff -r b41da4470aa6 -r 29673e3aab05 linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:11:21 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:18:51 2006 +1000 @@ -51,14 +51,12 @@ static int net_open(struct net_device *d netif_t *netif = netdev_priv(dev); if (netif_carrier_ok(dev)) __netif_up(netif); - netif_start_queue(dev); return 0; } static int net_close(struct net_device *dev) { netif_t *netif = netdev_priv(dev); - netif_stop_queue(dev); if (netif_carrier_ok(dev)) __netif_down(netif); return 0; @@ -107,8 +105,11 @@ netif_t *netif_alloc(domid_t domid, unsi SET_ETHTOOL_OPS(dev, &network_ethtool_ops); - /* Disable queuing. */ - dev->tx_queue_len = 0; + /* + * Reduce default TX queuelen so that each guest interface only + * allows it to eat around 6.4MB of host memory. + */ + dev->tx_queue_len = 100; for (i = 0; i < ETH_ALEN; i++) if (be_mac[i] != 0) diff -r b41da4470aa6 -r 29673e3aab05 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:11:21 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:18:51 2006 +1000 @@ -136,6 +136,14 @@ static inline int is_xen_skb(struct sk_b return (cp == skbuff_cachep); } +static inline int netbk_queue_full(netif_t *netif) +{ + RING_IDX peek = netif->rx_req_cons_peek; + + return netif->rx.sring->req_prod - peek <= 0 || + netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0; +} + int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) { netif_t *netif = netdev_priv(dev); @@ -143,11 +151,13 @@ int netif_be_start_xmit(struct sk_buff * BUG_ON(skb->dev != dev); /* Drop the packet if the target domain has no receive buffers. */ - if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)) || - (netif->rx_req_cons_peek == netif->rx.sring->req_prod) || - ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) =- NET_RX_RING_SIZE)) + if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev))) goto drop; + + if (unlikely(netbk_queue_full(netif))) { + BUG_ON(netbk_can_queue(dev)); + goto drop; + } /* * We do not copy the packet unless: @@ -178,6 +188,9 @@ int netif_be_start_xmit(struct sk_buff * netif->rx_req_cons_peek++; netif_get(netif); + if (netbk_can_queue(dev) && netbk_queue_full(netif)) + netif_stop_queue(dev); + skb_queue_tail(&rx_queue, skb); tasklet_schedule(&net_rx_tasklet); @@ -351,6 +364,10 @@ static void net_rx_action(unsigned long notify_list[notify_nr++] = irq; } + if (netif_queue_stopped(netif->dev) && + !netbk_queue_full(netif)) + netif_wake_queue(netif->dev); + netif_put(netif); dev_kfree_skb(skb); gop++; @@ -974,8 +991,13 @@ irqreturn_t netif_be_int(int irq, void * irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs) { netif_t *netif = dev_id; + add_to_net_schedule_list_tail(netif); maybe_schedule_tx_action(); + + if (netif_queue_stopped(netif->dev) && !netbk_queue_full(netif)) + netif_wake_queue(netif->dev); + return IRQ_HANDLED; } diff -r b41da4470aa6 -r 29673e3aab05 linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 28 18:11:21 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 28 18:18:51 2006 +1000 @@ -353,6 +353,7 @@ static int connect_rings(struct backend_ unsigned long tx_ring_ref, rx_ring_ref; unsigned int evtchn; int err; + int val; DPRINTK(""); @@ -366,6 +367,15 @@ static int connect_rings(struct backend_ dev->otherend); return err; } + + if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-notify", "%d", + &val) < 0) + val = 0; + if (val) + be->netif->can_queue = 1; + else + /* Must be non-zero for pfifo_fast to work. */ + be->netif->dev->tx_queue_len = 1; /* Map the shared frame, irq etc. */ err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Add SG support This patch adds scatter-and-gather support to the frontend. It also advertises this fact through xenbus so that the backend can detect this and send through SG requests only if it is supported. SG support is required to support skb''s larger than one page. This in turn is needed for either jumbo MTU or TSO. One of these is required to bring local networking performance up to a level that is acceptable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 29673e3aab05 -r 05d6393e9d18 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 18:18:51 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 18:20:45 2006 +1000 @@ -46,11 +46,11 @@ #include <linux/ethtool.h> #include <linux/in.h> #include <linux/if_ether.h> +#include <linux/io.h> #include <net/sock.h> #include <net/pkt_sched.h> #include <net/arp.h> #include <net/route.h> -#include <asm/io.h> #include <asm/uaccess.h> #include <xen/evtchn.h> #include <xen/xenbus.h> @@ -62,17 +62,12 @@ #include <xen/interface/grant_table.h> #include <xen/gnttab.h> +#define RX_COPY_THRESHOLD 256 + #define GRANT_INVALID_REF 0 #define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE) - -static inline void init_skb_shinfo(struct sk_buff *skb) -{ - atomic_set(&(skb_shinfo(skb)->dataref), 1); - skb_shinfo(skb)->nr_frags = 0; - skb_shinfo(skb)->frag_list = NULL; -} struct netfront_info { struct list_head list; @@ -329,6 +324,12 @@ again: err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify", "%d", 1); if (err) { message = "writing feature-rx-notify"; + goto abort_transaction; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", 1); + if (err) { + message = "writing feature-sg"; goto abort_transaction; } @@ -575,10 +576,13 @@ static void network_alloc_rx_buffers(str unsigned short id; struct netfront_info *np = netdev_priv(dev); struct sk_buff *skb; + struct page *page; int i, batch_target; RING_IDX req_prod = np->rx.req_prod_pvt; struct xen_memory_reservation reservation; grant_ref_t ref; + unsigned long pfn; + void *vaddr; if (unlikely(!netif_carrier_ok(dev))) return; @@ -591,15 +595,16 @@ static void network_alloc_rx_buffers(str */ batch_target = np->rx_target - (req_prod - np->rx.rsp_cons); for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) { - /* - * Subtract dev_alloc_skb headroom (16 bytes) and shared info - * tailroom then round down to SKB_DATA_ALIGN boundary. - */ - skb = __dev_alloc_skb( - ((PAGE_SIZE - sizeof(struct skb_shared_info)) & - (-SKB_DATA_ALIGN(1))) - 16, - GFP_ATOMIC|__GFP_NOWARN); - if (skb == NULL) { + /* Allocate an skb and a page. */ + skb = __dev_alloc_skb(RX_COPY_THRESHOLD, + GFP_ATOMIC | __GFP_NOWARN); + if (unlikely(!skb)) + goto no_skb; + + page = alloc_page(GFP_ATOMIC | __GFP_NOWARN); + if (!page) { + kfree_skb(skb); +no_skb: /* Any skbuffs queued for refill? Force them out. */ if (i != 0) goto refill; @@ -608,6 +613,9 @@ static void network_alloc_rx_buffers(str jiffies + (HZ/10)); break; } + + skb_shinfo(skb)->frags[0].page = page; + skb_shinfo(skb)->nr_frags = 1; __skb_queue_tail(&np->rx_batch, skb); } @@ -639,18 +647,20 @@ static void network_alloc_rx_buffers(str ref = gnttab_claim_grant_reference(&np->gref_rx_head); BUG_ON((signed short)ref < 0); np->grant_rx_ref[id] = ref; + + pfn = page_to_pfn(skb_shinfo(skb)->frags[0].page); + vaddr = page_address(skb_shinfo(skb)->frags[0].page); + gnttab_grant_foreign_transfer_ref(ref, - np->xbdev->otherend_id, - __pa(skb->head)>>PAGE_SHIFT); + np->xbdev->otherend_id, pfn); RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref; - np->rx_pfn_array[i] = virt_to_mfn(skb->head); + np->rx_pfn_array[i] = pfn_to_mfn(pfn); if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* Remove this page before passing back to Xen. */ - set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT, - INVALID_P2M_ENTRY); + set_phys_to_machine(pfn, INVALID_P2M_ENTRY); MULTI_update_va_mapping(np->rx_mcl+i, - (unsigned long)skb->head, + (unsigned long)vaddr, __pte(0), 0); } } @@ -888,41 +898,29 @@ static void xennet_move_rx_slot(struct n np->rx.req_prod_pvt++; } -static int netif_poll(struct net_device *dev, int *pbudget) -{ - struct netfront_info *np = netdev_priv(dev); - struct sk_buff *skb, *nskb; - struct netif_rx_response *rx; - RING_IDX i, rp; - struct mmu_update *mmu = np->rx_mmu; - struct multicall_entry *mcl = np->rx_mcl; - int work_done, budget, more_to_do = 1; - struct sk_buff_head rxq; - unsigned long flags; - unsigned long mfn; - grant_ref_t ref; - - spin_lock(&np->rx_lock); - - if (unlikely(!netif_carrier_ok(dev))) { - spin_unlock(&np->rx_lock); - return 0; - } - - skb_queue_head_init(&rxq); - - if ((budget = *pbudget) > dev->quota) - budget = dev->quota; - rp = np->rx.sring->rsp_prod; - rmb(); /* Ensure we see queued responses up to ''rp''. */ - - for (i = np->rx.rsp_cons, work_done = 0; - (i != rp) && (work_done < budget); - i++, work_done++) { - rx = RING_GET_RESPONSE(&np->rx, i); - - skb = xennet_get_rx_skb(np, i); - ref = xennet_get_rx_ref(np, i); +static int xennet_get_responses(struct netfront_info *np, + struct netif_rx_response *rx, RING_IDX rp, + struct sk_buff_head *list, int count) +{ + struct mmu_update *mmu = np->rx_mmu + count; + struct multicall_entry *mcl = np->rx_mcl + count; + RING_IDX cons = np->rx.rsp_cons; + struct sk_buff *skb = xennet_get_rx_skb(np, cons); + grant_ref_t ref = xennet_get_rx_ref(np, cons); + int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); + int frags = 1; + int err = 0; + + for (;;) { + unsigned long mfn; + + if (unlikely(rx->status < 0 || + rx->offset + rx->status > PAGE_SIZE)) { + if (net_ratelimit()) + WPRINTK("rx->offset: %x, size: %u\n", + rx->offset, rx->status); + err = -EINVAL; + } /* * This definitely indicates a bug, either in this driver or in @@ -931,8 +929,8 @@ static int netif_poll(struct net_device */ if (ref == GRANT_INVALID_REF) { WPRINTK("Bad rx response id %d.\n", rx->id); - work_done--; - continue; + err = -EINVAL; + goto next; } /* Memory pressure, insufficient buffer headroom, ... */ @@ -941,16 +939,161 @@ static int netif_poll(struct net_device WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n", rx->id, rx->status); xennet_move_rx_slot(np, skb, ref); + err = -ENOMEM; + goto next; + } + + gnttab_release_grant_reference(&np->gref_rx_head, ref); + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + /* Remap the page. */ + struct page *page = skb_shinfo(skb)->frags[0].page; + unsigned long pfn = page_to_pfn(page); + void *vaddr = page_address(page); + + MULTI_update_va_mapping(mcl, (unsigned long)vaddr, + pfn_pte_ma(mfn, PAGE_KERNEL), + 0); + mcl++; + mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT) + | MMU_MACHPHYS_UPDATE; + mmu->val = pfn; + mmu++; + + set_phys_to_machine(pfn, mfn); + } + + __skb_queue_tail(list, skb); + +next: + if (!(rx->flags & NETRXF_more_data)) + break; + + if (cons + frags == rp) { + if (net_ratelimit()) + WPRINTK("Need more frags\n"); + err = -ENOENT; + break; + } + + rx = RING_GET_RESPONSE(&np->rx, cons + frags); + skb = xennet_get_rx_skb(np, cons + frags); + ref = xennet_get_rx_ref(np, cons + frags); + frags++; + } + + if (unlikely(frags > max)) { + if (net_ratelimit()) + WPRINTK("Too many frags\n"); + err = -E2BIG; + } + + return err; +} + +static RING_IDX xennet_fill_frags(struct netfront_info *np, + struct sk_buff *skb, + struct sk_buff_head *list) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + int nr_frags = shinfo->nr_frags; + RING_IDX cons = np->rx.rsp_cons; + skb_frag_t *frag = shinfo->frags + nr_frags; + struct sk_buff *nskb; + + while ((nskb = __skb_dequeue(list))) { + struct netif_rx_response *rx + RING_GET_RESPONSE(&np->rx, ++cons); + + frag->page = skb_shinfo(nskb)->frags[0].page; + frag->page_offset = rx->offset; + frag->size = rx->status; + + skb->data_len += rx->status; + + skb_shinfo(nskb)->nr_frags = 0; + kfree_skb(nskb); + + frag++; + nr_frags++; + } + + shinfo->nr_frags = nr_frags; + return cons; +} + +static int netif_poll(struct net_device *dev, int *pbudget) +{ + struct netfront_info *np = netdev_priv(dev); + struct sk_buff *skb; + struct netif_rx_response *rx; + RING_IDX i, rp; + struct multicall_entry *mcl; + int work_done, budget, more_to_do = 1; + struct sk_buff_head rxq; + struct sk_buff_head errq; + struct sk_buff_head tmpq; + unsigned long flags; + unsigned int len; + int pages_done; + int err; + + spin_lock(&np->rx_lock); + + if (unlikely(!netif_carrier_ok(dev))) { + spin_unlock(&np->rx_lock); + return 0; + } + + skb_queue_head_init(&rxq); + skb_queue_head_init(&errq); + skb_queue_head_init(&tmpq); + + if ((budget = *pbudget) > dev->quota) + budget = dev->quota; + rp = np->rx.sring->rsp_prod; + rmb(); /* Ensure we see queued responses up to ''rp''. */ + + for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0; + (i != rp) && (work_done < budget); + np->rx.rsp_cons = ++i, work_done++) { + rx = RING_GET_RESPONSE(&np->rx, i); + + err = xennet_get_responses(np, rx, rp, &tmpq, pages_done); + pages_done += skb_queue_len(&tmpq); + + if (unlikely(err)) { + i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1; work_done--; + while ((skb = __skb_dequeue(&tmpq))) + __skb_queue_tail(&errq, skb); + np->stats.rx_errors++; continue; } - gnttab_release_grant_reference(&np->gref_rx_head, ref); - - /* NB. We handle skb overflow later. */ - skb->data = skb->head + rx->offset; - skb->len = rx->status; - skb->tail = skb->data + skb->len; + skb = __skb_dequeue(&tmpq); + + skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page; + skb->h.raw = skb->nh.raw + rx->offset; + + len = rx->status; + if (len > RX_COPY_THRESHOLD) + len = RX_COPY_THRESHOLD; + skb_put(skb, len); + + if (rx->status > len) { + skb_shinfo(skb)->frags[0].page_offset + rx->offset + len; + skb_shinfo(skb)->frags[0].size = rx->status - len; + skb->data_len = rx->status - len; + } else { + skb_shinfo(skb)->frags[0].page = NULL; + skb_shinfo(skb)->nr_frags = 0; + } + + i = xennet_fill_frags(np, skb, &tmpq); + skb->truesize += skb->data_len; + skb->len += skb->data_len; /* * Old backends do not assert data_validated but we @@ -966,96 +1109,38 @@ static int netif_poll(struct net_device skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank); np->stats.rx_packets++; - np->stats.rx_bytes += rx->status; - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - /* Remap the page. */ - MULTI_update_va_mapping(mcl, (unsigned long)skb->head, - pfn_pte_ma(mfn, PAGE_KERNEL), - 0); - mcl++; - mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT) - | MMU_MACHPHYS_UPDATE; - mmu->val = __pa(skb->head) >> PAGE_SHIFT; - mmu++; - - set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT, - mfn); - } + np->stats.rx_bytes += skb->len; __skb_queue_tail(&rxq, skb); } /* Some pages are no longer absent... */ - balloon_update_driver_allowance(-work_done); + balloon_update_driver_allowance(-pages_done); /* Do all the remapping work, and M2P updates, in one big hypercall. */ - if (likely((mcl - np->rx_mcl) != 0)) { + if (likely(pages_done)) { + mcl = np->rx_mcl + pages_done; mcl->op = __HYPERVISOR_mmu_update; mcl->args[0] = (unsigned long)np->rx_mmu; - mcl->args[1] = mmu - np->rx_mmu; + mcl->args[1] = pages_done; mcl->args[2] = 0; mcl->args[3] = DOMID_SELF; - mcl++; - (void)HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl); - } + (void)HYPERVISOR_multicall(np->rx_mcl, pages_done + 1); + } + + while ((skb = __skb_dequeue(&errq))) + kfree_skb(skb); while ((skb = __skb_dequeue(&rxq)) != NULL) { - if (skb->len > (dev->mtu + ETH_HLEN + 4)) { - if (net_ratelimit()) - printk(KERN_INFO "Received packet too big for " - "MTU (%d > %d)\n", - skb->len - ETH_HLEN - 4, dev->mtu); - skb->len = 0; - skb->tail = skb->data; - init_skb_shinfo(skb); - dev_kfree_skb(skb); - continue; - } - - /* - * Enough room in skbuff for the data we were passed? Also, - * Linux expects at least 16 bytes headroom in each rx buffer. - */ - if (unlikely(skb->tail > skb->end) || - unlikely((skb->data - skb->head) < 16)) { - if (net_ratelimit()) { - if (skb->tail > skb->end) - printk(KERN_INFO "Received packet " - "is %zd bytes beyond tail.\n", - skb->tail - skb->end); - else - printk(KERN_INFO "Received packet " - "is %zd bytes before head.\n", - 16 - (skb->data - skb->head)); - } - - nskb = __dev_alloc_skb(skb->len + 2, - GFP_ATOMIC|__GFP_NOWARN); - if (nskb != NULL) { - skb_reserve(nskb, 2); - skb_put(nskb, skb->len); - memcpy(nskb->data, skb->data, skb->len); - /* Copy any other fields we already set up. */ - nskb->dev = skb->dev; - nskb->ip_summed = skb->ip_summed; - nskb->proto_data_valid = skb->proto_data_valid; - nskb->proto_csum_blank = skb->proto_csum_blank; - } - - /* Reinitialise and then destroy the old skbuff. */ - skb->len = 0; - skb->tail = skb->data; - init_skb_shinfo(skb); - dev_kfree_skb(skb); - - /* Switch old for new, if we copied the buffer. */ - if ((skb = nskb) == NULL) - continue; - } - - /* Set the shinfo area, which is hidden behind the data. */ - init_skb_shinfo(skb); + struct page *page = (struct page *)skb->nh.raw; + void *vaddr = page_address(page); + + memcpy(skb->data, vaddr + (skb->h.raw - skb->nh.raw), + skb_headlen(skb)); + + if (page != skb_shinfo(skb)->frags[0].page) + __free_page(page); + /* Ethernet work: Delayed to here as it peeks the header. */ skb->protocol = eth_type_trans(skb, dev); @@ -1063,8 +1148,6 @@ static int netif_poll(struct net_device netif_receive_skb(skb); dev->last_rx = jiffies; } - - np->rx.rsp_cons = i; /* If we get a callback with very few responses, reduce fill target. */ /* NB. Note exponential increase, linear decrease. */ @@ -1205,7 +1288,7 @@ static void network_connect(struct net_d gnttab_grant_foreign_transfer_ref( np->grant_rx_ref[i], np->xbdev->otherend_id, - __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT); + page_to_pfn(skb_shinfo(np->rx_skbs[i])->frags->page)); RING_GET_REQUEST(&np->rx, requeue_idx)->gref np->grant_rx_ref[i]; RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx; diff -r 29673e3aab05 -r 05d6393e9d18 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Fri Jul 28 18:18:51 2006 +1000 +++ b/xen/include/public/io/netif.h Fri Jul 28 18:20:45 2006 +1000 @@ -123,6 +123,10 @@ typedef struct netif_rx_request netif_rx #define _NETRXF_csum_blank (1) #define NETRXF_csum_blank (1U<<_NETRXF_csum_blank) +/* Packet continues in the next request descriptor. */ +#define _NETRXF_more_data (2) +#define NETRXF_more_data (1U<<_NETRXF_more_data) + struct netif_rx_response { uint16_t id; uint16_t offset; /* Offset in page of start of received packet */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-28 08:45 UTC
[Xen-devel] [7/9] [NET] back: Transmit SG packets if supported
Hi: [NET] back: Transmit SG packets if supported This patch adds scatter-and-gather transmission support to the backend. This allows the MTU to be raised right now and the potential for TSO in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 05d6393e9d18 -r 4f008474675a linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 28 18:20:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 28 18:22:22 2006 +1000 @@ -129,4 +129,10 @@ static inline int netbk_can_queue(struct return netif->can_queue; } +static inline int netbk_can_sg(struct net_device *dev) +{ + netif_t *netif = netdev_priv(dev); + return netif->features & NETIF_F_SG; +} + #endif /* __NETIF__BACKEND__COMMON_H__ */ diff -r 05d6393e9d18 -r 4f008474675a linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:20:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:22:22 2006 +1000 @@ -62,10 +62,34 @@ static int net_close(struct net_device * return 0; } +static int netbk_change_mtu(struct net_device *dev, int mtu) +{ + int max = netbk_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; + + if (mtu > max) + return -EINVAL; + dev->mtu = mtu; + return 0; +} + +static int netbk_set_sg(struct net_device *dev, u32 data) +{ + if (data) { + netif_t *netif = netdev_priv(dev); + + if (!(netif->features & NETIF_F_SG)) + return -ENOSYS; + } + + return ethtool_op_set_sg(dev, data); +} + static struct ethtool_ops network_ethtool_ops { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = ethtool_op_set_tx_csum, + .get_sg = ethtool_op_get_sg, + .set_sg = netbk_set_sg, .get_link = ethtool_op_get_link, }; @@ -101,6 +125,7 @@ netif_t *netif_alloc(domid_t domid, unsi dev->get_stats = netif_be_get_stats; dev->open = net_open; dev->stop = net_close; + dev->change_mtu = netbk_change_mtu; dev->features = NETIF_F_IP_CSUM; SET_ETHTOOL_OPS(dev, &network_ethtool_ops); diff -r 05d6393e9d18 -r 4f008474675a linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:20:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:22:22 2006 +1000 @@ -40,6 +40,11 @@ /*#define NETBE_DEBUG_INTERRUPT*/ +struct netbk_rx_meta { + skb_frag_t frag; + int id; +}; + static void netif_idx_release(u16 pending_idx); static void netif_page_release(struct page *page); static void make_tx_response(netif_t *netif, @@ -100,21 +105,27 @@ static unsigned long mfn_list[MAX_MFN_AL static unsigned long mfn_list[MAX_MFN_ALLOC]; static unsigned int alloc_index = 0; -static unsigned long alloc_mfn(void) -{ - unsigned long mfn = 0; +static inline unsigned long alloc_mfn(void) +{ + return mfn_list[--alloc_index]; +} + +static int check_mfn(int nr) +{ struct xen_memory_reservation reservation = { - .nr_extents = MAX_MFN_ALLOC, .extent_order = 0, .domid = DOMID_SELF }; - set_xen_guest_handle(reservation.extent_start, mfn_list); - if ( unlikely(alloc_index == 0) ) - alloc_index = HYPERVISOR_memory_op( - XENMEM_increase_reservation, &reservation); - if ( alloc_index != 0 ) - mfn = mfn_list[--alloc_index]; - return mfn; + + if (likely(alloc_index >= nr)) + return 0; + + set_xen_guest_handle(reservation.extent_start, mfn_list + alloc_index); + reservation.nr_extents = MAX_MFN_ALLOC - alloc_index; + alloc_index += HYPERVISOR_memory_op(XENMEM_increase_reservation, + &reservation); + + return alloc_index >= nr ? 0 : -ENOMEM; } static inline void maybe_schedule_tx_action(void) @@ -136,12 +147,87 @@ static inline int is_xen_skb(struct sk_b return (cp == skbuff_cachep); } +static struct sk_buff *netbk_copy_skb(struct sk_buff *skb) +{ + struct skb_shared_info *ninfo; + struct sk_buff *nskb; + unsigned long offset; + int ret; + int len; + int headlen; + + nskb = alloc_skb(SKB_MAX_HEAD(0), GFP_ATOMIC); + if (unlikely(!nskb)) + goto err; + + skb_reserve(nskb, 16); + headlen = nskb->end - nskb->data; + if (headlen > skb_headlen(skb)) + headlen = skb_headlen(skb); + ret = skb_copy_bits(skb, 0, __skb_put(nskb, headlen), headlen); + BUG_ON(ret); + + ninfo = skb_shinfo(nskb); + ninfo->gso_size = skb_shinfo(skb)->gso_size; + ninfo->gso_type = skb_shinfo(skb)->gso_type; + + offset = headlen; + len = skb->len - headlen; + + nskb->len = skb->len; + nskb->data_len = len; + nskb->truesize += len; + + while (len) { + struct page *page; + int copy; + int zero; + + if (unlikely(ninfo->nr_frags >= MAX_SKB_FRAGS)) { + dump_stack(); + goto err_free; + } + + copy = len >= PAGE_SIZE ? PAGE_SIZE : len; + zero = len >= PAGE_SIZE ? 0 : __GFP_ZERO; + + page = alloc_page(GFP_ATOMIC | zero); + if (unlikely(!page)) + goto err_free; + + ret = skb_copy_bits(skb, offset, page_address(page), copy); + BUG_ON(ret); + + ninfo->frags[ninfo->nr_frags].page = page; + ninfo->frags[ninfo->nr_frags].page_offset = 0; + ninfo->frags[ninfo->nr_frags].size = copy; + ninfo->nr_frags++; + + offset += copy; + len -= copy; + } + + offset = nskb->data - skb->data; + + nskb->h.raw = skb->h.raw + offset; + nskb->nh.raw = skb->nh.raw + offset; + nskb->mac.raw = skb->mac.raw + offset; + + return nskb; + +err_free: + kfree_skb(nskb); +err: + return NULL; +} + static inline int netbk_queue_full(netif_t *netif) { RING_IDX peek = netif->rx_req_cons_peek; - return netif->rx.sring->req_prod - peek <= 0 || - netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0; + return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS || + netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <+ MAX_SKB_FRAGS; } int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -163,20 +249,12 @@ int netif_be_start_xmit(struct sk_buff * * We do not copy the packet unless: * 1. The data is shared; or * 2. The data is not allocated from our special cache. - * NB. We also couldn''t cope with fragmented packets, but we won''t get - * any because we not advertise the NETIF_F_SG feature. + * 3. The data is fragmented. */ - if (skb_shared(skb) || skb_cloned(skb) || !is_xen_skb(skb)) { - int hlen = skb->data - skb->head; - int ret; - struct sk_buff *nskb = dev_alloc_skb(hlen + skb->len); + if (skb_cloned(skb) || skb_is_nonlinear(skb) || !is_xen_skb(skb)) { + struct sk_buff *nskb = netbk_copy_skb(skb); if ( unlikely(nskb == NULL) ) goto drop; - skb_reserve(nskb, hlen); - __skb_put(nskb, skb->len); - ret = skb_copy_bits(skb, -hlen, nskb->data - hlen, - skb->len + hlen); - BUG_ON(ret); /* Copy only the header fields we use in this driver. */ nskb->dev = skb->dev; nskb->ip_summed = skb->ip_summed; @@ -185,7 +263,7 @@ int netif_be_start_xmit(struct sk_buff * skb = nskb; } - netif->rx_req_cons_peek++; + netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1; netif_get(netif); if (netbk_can_queue(dev) && netbk_queue_full(netif)) @@ -221,116 +299,80 @@ int xen_network_done(void) } #endif -static void net_rx_action(unsigned long unused) -{ - netif_t *netif = NULL; - s8 status; - u16 size, id, irq, flags; - multicall_entry_t *mcl; - mmu_update_t *mmu; - gnttab_transfer_t *gop; - unsigned long vdata, old_mfn, new_mfn; - struct sk_buff_head rxq; - struct sk_buff *skb; - int notify_nr = 0; - int ret; +static u16 netbk_gop_frag(netif_t *netif, struct page *page, int count, int i) +{ + multicall_entry_t *mcl = rx_mcl + count; + mmu_update_t *mmu = rx_mmu + count; + gnttab_transfer_t *gop = grant_rx_op + count; + netif_rx_request_t *req; + unsigned long old_mfn, new_mfn; + + old_mfn = virt_to_mfn(page_address(page)); + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + new_mfn = alloc_mfn(); + + /* + * Set the new P2M table entry before reassigning + * the old data page. Heed the comment in + * pgtable-2level.h:pte_page(). :-) + */ + set_phys_to_machine(page_to_pfn(page), new_mfn); + + MULTI_update_va_mapping(mcl, (unsigned long)page_address(page), + pfn_pte_ma(new_mfn, PAGE_KERNEL), 0); + + mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) | + MMU_MACHPHYS_UPDATE; + mmu->val = page_to_pfn(page); + } + + req = RING_GET_REQUEST(&netif->rx, netif->rx.req_cons + i); + gop->mfn = old_mfn; + gop->domid = netif->domid; + gop->ref = req->gref; + return req->id; +} + +static void netbk_gop_skb(struct sk_buff *skb, struct netbk_rx_meta *meta, + int count) +{ + netif_t *netif = netdev_priv(skb->dev); + int nr_frags = skb_shinfo(skb)->nr_frags; + int i; + + for (i = 0; i < nr_frags; i++) { + meta[++count].frag = skb_shinfo(skb)->frags[i]; + meta[count].id = netbk_gop_frag(netif, meta[count].frag.page, + count, i + 1); + } + /* - * Putting hundreds of bytes on the stack is considered rude. - * Static works because a tasklet can only be on one CPU at any time. + * This must occur at the end to ensure that we don''t trash + * skb_shinfo until we''re done. */ - static u16 notify_list[NET_RX_RING_SIZE]; - - skb_queue_head_init(&rxq); - - mcl = rx_mcl; - mmu = rx_mmu; - gop = grant_rx_op; - - while ((skb = skb_dequeue(&rx_queue)) != NULL) { - netif = netdev_priv(skb->dev); - vdata = (unsigned long)skb->data; - old_mfn = virt_to_mfn(vdata); - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - /* Memory squeeze? Back off for an arbitrary while. */ - if ((new_mfn = alloc_mfn()) == 0) { - if ( net_ratelimit() ) - WPRINTK("Memory squeeze in netback " - "driver.\n"); - mod_timer(&net_timer, jiffies + HZ); - skb_queue_head(&rx_queue, skb); - break; - } - /* - * Set the new P2M table entry before reassigning - * the old data page. Heed the comment in - * pgtable-2level.h:pte_page(). :-) - */ - set_phys_to_machine( - __pa(skb->data) >> PAGE_SHIFT, - new_mfn); - - MULTI_update_va_mapping(mcl, vdata, - pfn_pte_ma(new_mfn, - PAGE_KERNEL), 0); - mcl++; - - mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) | - MMU_MACHPHYS_UPDATE; - mmu->val = __pa(vdata) >> PAGE_SHIFT; - mmu++; - } - - gop->mfn = old_mfn; - gop->domid = netif->domid; - gop->ref = RING_GET_REQUEST( - &netif->rx, netif->rx.req_cons)->gref; - netif->rx.req_cons++; - gop++; - - __skb_queue_tail(&rxq, skb); - - /* Filled the batch queue? */ - if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op)) - break; - } - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - if (mcl == rx_mcl) - return; - - mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL; - - if (mmu - rx_mmu) { - mcl->op = __HYPERVISOR_mmu_update; - mcl->args[0] = (unsigned long)rx_mmu; - mcl->args[1] = mmu - rx_mmu; - mcl->args[2] = 0; - mcl->args[3] = DOMID_SELF; - mcl++; - } - - ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl); - BUG_ON(ret != 0); - } - - ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, - gop - grant_rx_op); - BUG_ON(ret != 0); - - mcl = rx_mcl; - gop = grant_rx_op; - while ((skb = __skb_dequeue(&rxq)) != NULL) { - netif = netdev_priv(skb->dev); - size = skb->tail - skb->data; - - atomic_set(&(skb_shinfo(skb)->dataref), 1); - skb_shinfo(skb)->nr_frags = 0; - skb_shinfo(skb)->frag_list = NULL; - - netif->stats.tx_bytes += size; - netif->stats.tx_packets++; - + meta[count - nr_frags].id = netbk_gop_frag(netif, + virt_to_page(skb->data), + count - nr_frags, 0); + netif->rx.req_cons += nr_frags + 1; +} + +static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta) +{ + int i; + + for (i = 0; i < nr_frags; i++) + put_page(meta[i].frag.page); +} + +static int netbk_check_gop(int nr_frags, domid_t domid, int count) +{ + multicall_entry_t *mcl = rx_mcl + count; + gnttab_transfer_t *gop = grant_rx_op + count; + int status = NETIF_RSP_OKAY; + int i; + + for (i = 0; i <= nr_frags; i++) { if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* The update_va_mapping() must not fail. */ BUG_ON(mcl->result != 0); @@ -338,10 +380,9 @@ static void net_rx_action(unsigned long } /* Check the reassignment error code. */ - status = NETIF_RSP_OKAY; if (gop->status != 0) { DPRINTK("Bad status %d from grant transfer to DOM%u\n", - gop->status, netif->domid); + gop->status, domid); /* * Page no longer belongs to us unless GNTST_bad_page, * but that should be a fatal error anyway. @@ -349,17 +390,128 @@ static void net_rx_action(unsigned long BUG_ON(gop->status == GNTST_bad_page); status = NETIF_RSP_ERROR; } - irq = netif->irq; - id = RING_GET_REQUEST(&netif->rx, netif->rx.rsp_prod_pvt)->id; - flags = 0; + gop++; + } + + return status; +} + +static void netbk_add_frag_responses(netif_t *netif, int status, + struct netbk_rx_meta *meta, int nr_frags) +{ + int i; + + for (i = 0; i < nr_frags; i++) { + int id = meta[i].id; + int flags = (i == nr_frags - 1) ? 0 : NETRXF_more_data; + + make_rx_response(netif, id, status, meta[i].frag.page_offset, + meta[i].frag.size, flags); + } +} + +static void net_rx_action(unsigned long unused) +{ + netif_t *netif = NULL; + s8 status; + u16 id, irq, flags; + multicall_entry_t *mcl; + struct sk_buff_head rxq; + struct sk_buff *skb; + int notify_nr = 0; + int ret; + int nr_frags; + int count; + + /* + * Putting hundreds of bytes on the stack is considered rude. + * Static works because a tasklet can only be on one CPU at any time. + */ + static u16 notify_list[NET_RX_RING_SIZE]; + static struct netbk_rx_meta meta[NET_RX_RING_SIZE]; + + skb_queue_head_init(&rxq); + + count = 0; + + while ((skb = skb_dequeue(&rx_queue)) != NULL) { + nr_frags = skb_shinfo(skb)->nr_frags; + *(int *)skb->cb = nr_frags; + + if (!xen_feature(XENFEAT_auto_translated_physmap) && + check_mfn(nr_frags + 1)) { + /* Memory squeeze? Back off for an arbitrary while. */ + if ( net_ratelimit() ) + WPRINTK("Memory squeeze in netback " + "driver.\n"); + mod_timer(&net_timer, jiffies + HZ); + skb_queue_head(&rx_queue, skb); + break; + } + + netbk_gop_skb(skb, meta, count); + + count += nr_frags + 1; + + __skb_queue_tail(&rxq, skb); + + /* Filled the batch queue? */ + if (count + MAX_SKB_FRAGS >= NET_RX_RING_SIZE) + break; + } + + if (!count) + return; + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + mcl = rx_mcl + count; + + mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL; + + mcl->op = __HYPERVISOR_mmu_update; + mcl->args[0] = (unsigned long)rx_mmu; + mcl->args[1] = count; + mcl->args[2] = 0; + mcl->args[3] = DOMID_SELF; + + ret = HYPERVISOR_multicall(rx_mcl, count + 1); + BUG_ON(ret != 0); + } + + ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, count); + BUG_ON(ret != 0); + + count = 0; + while ((skb = __skb_dequeue(&rxq)) != NULL) { + nr_frags = *(int *)skb->cb; + + atomic_set(&(skb_shinfo(skb)->dataref), 1); + skb_shinfo(skb)->nr_frags = 0; + skb_shinfo(skb)->frag_list = NULL; + + netif = netdev_priv(skb->dev); + netif->stats.tx_bytes += skb->len; + netif->stats.tx_packets++; + + netbk_free_pages(nr_frags, meta + count + 1); + status = netbk_check_gop(nr_frags, netif->domid, count); + + id = meta[count].id; + flags = nr_frags ? NETRXF_more_data : 0; + if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ flags |= NETRXF_csum_blank | NETRXF_data_validated; else if (skb->proto_data_valid) /* remote but checksummed? */ flags |= NETRXF_data_validated; - if (make_rx_response(netif, id, status, - (unsigned long)skb->data & ~PAGE_MASK, - size, flags) && - (rx_notify[irq] == 0)) { + + make_rx_response(netif, id, status, offset_in_page(skb->data), + skb_headlen(skb), flags); + netbk_add_frag_responses(netif, status, meta + count + 1, + nr_frags); + + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, ret); + irq = netif->irq; + if (ret && !rx_notify[irq]) { rx_notify[irq] = 1; notify_list[notify_nr++] = irq; } @@ -370,7 +522,7 @@ static void net_rx_action(unsigned long netif_put(netif); dev_kfree_skb(skb); - gop++; + count += nr_frags + 1; } while (notify_nr != 0) { @@ -1040,7 +1192,6 @@ static int make_rx_response(netif_t *net { RING_IDX i = netif->rx.rsp_prod_pvt; netif_rx_response_t *resp; - int notify; resp = RING_GET_RESPONSE(&netif->rx, i); resp->offset = offset; @@ -1051,9 +1202,8 @@ static int make_rx_response(netif_t *net resp->status = (s16)st; netif->rx.rsp_prod_pvt = ++i; - RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, notify); - - return notify; + + return 0; } #ifdef NETBE_DEBUG_INTERRUPT diff -r 05d6393e9d18 -r 4f008474675a linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 28 18:20:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 28 18:22:22 2006 +1000 @@ -377,6 +377,13 @@ static int connect_rings(struct backend_ /* Must be non-zero for pfifo_fast to work. */ be->netif->dev->tx_queue_len = 1; + if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg", "%d", &val) < 0) + val = 0; + if (val) { + be->netif->features |= NETIF_F_SG; + be->netif->dev->features |= NETIF_F_SG; + } + /* Map the shared frame, irq etc. */ err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn); if (err) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Add TSO support This patch adds TCP Segmentation Offload (TSO) support to the frontend. It also advertises this fact through xenbus so that the frontend can detect this and send through TSO requests only if it is supported. This is done using an extra request slot which is indicated by a flag in the first slot. In future checksum offload can be done in the same way. Even though only TSO is supported for now the code actually supports GSO so it can be applied to any other protocol. The only missing bit is the detection of host support for a specific GSO protocol. Once that is added we can advertise all supported protocols to the guest. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 4f008474675a -r 3df7a115ddde linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 18:22:22 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 28 18:22:38 2006 +1000 @@ -116,6 +116,11 @@ struct netfront_info { struct mmu_update rx_mmu[NET_RX_RING_SIZE]; }; +struct netfront_rx_info { + struct netif_rx_response rx; + struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1]; +}; + /* * Access macros for acquiring freeing slots in tx_skbs[]. */ @@ -330,6 +335,12 @@ again: err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", 1); if (err) { message = "writing feature-sg"; + goto abort_transaction; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4", "%d", 1); + if (err) { + message = "writing feature-gso-tcpv4"; goto abort_transaction; } @@ -898,18 +909,65 @@ static void xennet_move_rx_slot(struct n np->rx.req_prod_pvt++; } +int xennet_get_extras(struct netfront_info *np, + struct netif_extra_info *extras, RING_IDX rp) + +{ + struct netif_extra_info *extra; + RING_IDX cons = np->rx.rsp_cons; + int err = 0; + + do { + struct sk_buff *skb; + grant_ref_t ref; + + if (unlikely(cons + 1 == rp)) { + if (net_ratelimit()) + WPRINTK("Missing extra info\n"); + err = -EBADR; + break; + } + + extra = (struct netif_extra_info *) + RING_GET_RESPONSE(&np->rx, ++cons); + + if (unlikely(!extra->type || + extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) { + if (net_ratelimit()) + WPRINTK("Invalid extra type: %d\n", + extra->type); + err = -EINVAL; + } else + memcpy(&extras[extra->type - 1], extra, sizeof(*extra)); + + skb = xennet_get_rx_skb(np, cons); + ref = xennet_get_rx_ref(np, cons); + xennet_move_rx_slot(np, skb, ref); + } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE); + + np->rx.rsp_cons = cons; + return err; +} + static int xennet_get_responses(struct netfront_info *np, - struct netif_rx_response *rx, RING_IDX rp, + struct netfront_rx_info *rinfo, RING_IDX rp, struct sk_buff_head *list, int count) { struct mmu_update *mmu = np->rx_mmu + count; struct multicall_entry *mcl = np->rx_mcl + count; + struct netif_rx_response *rx = &rinfo->rx; + struct netif_extra_info *extras = rinfo->extras; RING_IDX cons = np->rx.rsp_cons; struct sk_buff *skb = xennet_get_rx_skb(np, cons); grant_ref_t ref = xennet_get_rx_ref(np, cons); int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); int frags = 1; int err = 0; + + if (rx->flags & NETRXF_extra_info) { + err = xennet_get_extras(np, extras, rp); + cons = np->rx.rsp_cons; + } for (;;) { unsigned long mfn; @@ -1022,11 +1080,38 @@ static RING_IDX xennet_fill_frags(struct return cons; } +static int xennet_set_skb_gso(struct sk_buff *skb, struct netif_extra_info *gso) +{ + if (!gso->u.gso.size) { + if (net_ratelimit()) + WPRINTK("GSO size must not be zero.\n"); + return -EINVAL; + } + + /* Currently only TCPv4 S.O. is supported. */ + if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) { + if (net_ratelimit()) + WPRINTK("Bad GSO type %d.\n", gso->u.gso.type); + return -EINVAL; + } + + skb_shinfo(skb)->gso_size = gso->u.gso.size; + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + + /* Header must be checked, and gso_segs computed. */ + skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb)->gso_segs = 0; + + return 0; +} + static int netif_poll(struct net_device *dev, int *pbudget) { struct netfront_info *np = netdev_priv(dev); struct sk_buff *skb; - struct netif_rx_response *rx; + struct netfront_rx_info rinfo; + struct netif_rx_response *rx = &rinfo.rx; + struct netif_extra_info *extras = rinfo.extras; RING_IDX i, rp; struct multicall_entry *mcl; int work_done, budget, more_to_do = 1; @@ -1057,12 +1142,14 @@ static int netif_poll(struct net_device for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0; (i != rp) && (work_done < budget); np->rx.rsp_cons = ++i, work_done++) { - rx = RING_GET_RESPONSE(&np->rx, i); - - err = xennet_get_responses(np, rx, rp, &tmpq, pages_done); + memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx)); + memset(extras, 0, sizeof(extras)); + + err = xennet_get_responses(np, &rinfo, rp, &tmpq, pages_done); pages_done += skb_queue_len(&tmpq); if (unlikely(err)) { +err: i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1; work_done--; while ((skb = __skb_dequeue(&tmpq))) @@ -1072,6 +1159,16 @@ static int netif_poll(struct net_device } skb = __skb_dequeue(&tmpq); + + if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) { + struct netif_extra_info *gso; + gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1]; + + if (unlikely(xennet_set_skb_gso(skb, gso))) { + __skb_queue_head(&tmpq, skb); + goto err; + } + } skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page; skb->h.raw = skb->nh.raw + rx->offset; diff -r 4f008474675a -r 3df7a115ddde xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Fri Jul 28 18:22:22 2006 +1000 +++ b/xen/include/public/io/netif.h Fri Jul 28 18:22:38 2006 +1000 @@ -127,6 +127,10 @@ typedef struct netif_rx_request netif_rx #define _NETRXF_more_data (2) #define NETRXF_more_data (1U<<_NETRXF_more_data) +/* Packet to be followed by extra descriptor(s). */ +#define _NETRXF_extra_info (3) +#define NETRXF_extra_info (1U<<_NETRXF_extra_info) + struct netif_rx_response { uint16_t id; uint16_t offset; /* Offset in page of start of received packet */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-28 08:46 UTC
[Xen-devel] [9/9] [NET] back: Transmit TSO packets if supported
Hi: [NET] back: Transmit TSO packets if supported This patch adds TSO transmission support to the backend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3df7a115ddde -r 04cbf628ed5a linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:22:38 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 28 18:24:10 2006 +1000 @@ -84,12 +84,26 @@ static int netbk_set_sg(struct net_devic return ethtool_op_set_sg(dev, data); } +static int netbk_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + netif_t *netif = netdev_priv(dev); + + if (!(netif->features & NETIF_F_TSO)) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + static struct ethtool_ops network_ethtool_ops { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = netbk_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = netbk_set_tso, .get_link = ethtool_op_get_link, }; diff -r 3df7a115ddde -r 04cbf628ed5a linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:22:38 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 28 18:24:10 2006 +1000 @@ -50,12 +50,12 @@ static void make_tx_response(netif_t *ne static void make_tx_response(netif_t *netif, netif_tx_request_t *txp, s8 st); -static int make_rx_response(netif_t *netif, - u16 id, - s8 st, - u16 offset, - u16 size, - u16 flags); +static netif_rx_response_t *make_rx_response(netif_t *netif, + u16 id, + s8 st, + u16 offset, + u16 size, + u16 flags); static void net_tx_action(unsigned long unused); static DECLARE_TASKLET(net_tx_tasklet, net_tx_action, 0); @@ -225,9 +225,9 @@ static inline int netbk_queue_full(netif { RING_IDX peek = netif->rx_req_cons_peek; - return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS || + return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS + 1 || netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <- MAX_SKB_FRAGS; + MAX_SKB_FRAGS + 1; } int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -263,7 +263,8 @@ int netif_be_start_xmit(struct sk_buff * skb = nskb; } - netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1; + netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1 + + !!skb_shinfo(skb)->gso_size; netif_get(netif); if (netbk_can_queue(dev) && netbk_queue_full(netif)) @@ -340,11 +341,16 @@ static void netbk_gop_skb(struct sk_buff netif_t *netif = netdev_priv(skb->dev); int nr_frags = skb_shinfo(skb)->nr_frags; int i; + int extra; + + meta[count].frag.page_offset = skb_shinfo(skb)->gso_type; + meta[count].frag.size = skb_shinfo(skb)->gso_size; + extra = !!meta[count].frag.size + 1; for (i = 0; i < nr_frags; i++) { meta[++count].frag = skb_shinfo(skb)->frags[i]; meta[count].id = netbk_gop_frag(netif, meta[count].frag.page, - count, i + 1); + count, i + extra); } /* @@ -354,7 +360,7 @@ static void netbk_gop_skb(struct sk_buff meta[count - nr_frags].id = netbk_gop_frag(netif, virt_to_page(skb->data), count - nr_frags, 0); - netif->rx.req_cons += nr_frags + 1; + netif->rx.req_cons += nr_frags + extra; } static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta) @@ -415,6 +421,8 @@ static void net_rx_action(unsigned long netif_t *netif = NULL; s8 status; u16 id, irq, flags; + netif_rx_response_t *resp; + struct netif_extra_info *extra; multicall_entry_t *mcl; struct sk_buff_head rxq; struct sk_buff *skb; @@ -504,8 +512,33 @@ static void net_rx_action(unsigned long else if (skb->proto_data_valid) /* remote but checksummed? */ flags |= NETRXF_data_validated; - make_rx_response(netif, id, status, offset_in_page(skb->data), - skb_headlen(skb), flags); + resp = make_rx_response(netif, id, status, + offset_in_page(skb->data), + skb_headlen(skb), flags); + + extra = NULL; + + if (meta[count].frag.size) { + struct netif_extra_info *gso + (struct netif_extra_info *) + RING_GET_RESPONSE(&netif->rx, + netif->rx.rsp_prod_pvt++); + + if (extra) + extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE; + else + resp->flags |= NETRXF_extra_info; + + gso->u.gso.size = meta[count].frag.size; + gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; + gso->u.gso.pad = 0; + gso->u.gso.features = 0; + + gso->type = XEN_NETIF_EXTRA_TYPE_GSO; + gso->flags = 0; + extra = gso; + } + netbk_add_frag_responses(netif, status, meta + count + 1, nr_frags); @@ -1183,12 +1216,12 @@ static void make_tx_response(netif_t *ne #endif } -static int make_rx_response(netif_t *netif, - u16 id, - s8 st, - u16 offset, - u16 size, - u16 flags) +static netif_rx_response_t *make_rx_response(netif_t *netif, + u16 id, + s8 st, + u16 offset, + u16 size, + u16 flags) { RING_IDX i = netif->rx.rsp_prod_pvt; netif_rx_response_t *resp; @@ -1203,7 +1236,7 @@ static int make_rx_response(netif_t *net netif->rx.rsp_prod_pvt = ++i; - return 0; + return resp; } #ifdef NETBE_DEBUG_INTERRUPT diff -r 3df7a115ddde -r 04cbf628ed5a linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 28 18:22:38 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 28 18:24:10 2006 +1000 @@ -384,6 +384,14 @@ static int connect_rings(struct backend_ be->netif->dev->features |= NETIF_F_SG; } + if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4", "%d", + &val) < 0) + val = 0; + if (val) { + be->netif->features |= NETIF_F_TSO; + be->netif->dev->features |= NETIF_F_TSO; + } + /* Map the shared frame, irq etc. */ err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn); if (err) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Aug-04 14:44 UTC
Re: [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU
On Fri, Aug 04, 2006 at 12:06:51PM +0100, Emmanuel Ackaouy wrote:> > vif = [ '''', ''bridge=xenbr0'' ]Interesting. I believe this is due to a missing netif_rx_schedule when netfront''s interface is brought up. Please let me know if this patch fixes it or not. [NET] front: Check for received packets in network_open0 Because the backend brings up the interface long before the frontend has booted up, it is possible that by the time we get here we already have packets queued up for processing. If we don''t process them here, we may delay them more than what is necessary. Worse yet, it is possible to miss the notification interrupt from the backend in such a way that we never get another one until we bring the interface down and up. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 3efef49b7b49 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Aug 04 17:03:43 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Sat Aug 05 00:21:48 2006 +1000 @@ -497,6 +497,9 @@ static int network_open(struct net_devic network_alloc_rx_buffers(dev); np->rx.sring->rsp_event = np->rx.rsp_cons + 1; + if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx)) + netif_rx_schedule(dev); + netif_start_queue(dev); return 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2006-Aug-04 15:10 UTC
Re: [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU
On Sat, Aug 05, 2006 at 12:44:09AM +1000, Herbert Xu wrote:> On Fri, Aug 04, 2006 at 12:06:51PM +0100, Emmanuel Ackaouy wrote: > > > > vif = [ '''', ''bridge=xenbr0'' ] > > Interesting. I believe this is due to a missing netif_rx_schedule > when netfront''s interface is brought up. Please let me know if this > patch fixes it or not. > > [NET] front: Check for received packets in network_open0That seems to have fixed the problem. I pushed the patch to the staging tree. Thanks, Emmanuel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert, Does the RX_COPY_THRESHOLD you added to netfront need to be as large as 256? It seems rather large: for example, the copy length in netback for packets being transmitted is just 64. Also, your patch to netfront allocates a full page for each receive slot. The skbuff ''header'' area is really not much used, except for the first RX_COPY_THRESHOLD bytes of each packet. Rather than allocating lots of skbuffs up front and then freeing most of them when you receive jumbo packets, why not just allocate the skbuff part in netif_poll, for each complete packet that you detect you have received, rather than doing it in network_alloc_rx_buffers? Alternatively we could go back to the old method of rx buffer allocation which allocates page-sized skbuff header areas, then unmaps the page containing skb->head. This would avoid the need for RX_COPY_THRESHOLD, but would constrain the location of the first packet fragment (since the skb header area needs 16 bytes headroom and tailroom for skb_shared_info) or we''d need a slow path to relocate the header fragment. What do you think? Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Keir: On Fri, Aug 11, 2006 at 05:23:20PM +0100, Keir Fraser wrote:> > Does the RX_COPY_THRESHOLD you added to netfront need to be as large as 256? > It seems rather large: for example, the copy length in netback for packets > being transmitted is just 64.That was fairly arbitrary, but 64 bytes is probably too small since it''ll keep IPv6 TCP headers out of the head area (2 + 14 + 40 + 28 > 64). So we want at least 128 bytes for both front and back.> Also, your patch to netfront allocates a full page for each receive slot. > The skbuff ''header'' area is really not much used, except for the first > RX_COPY_THRESHOLD bytes of each packet. Rather than allocating lots of > skbuffs up front and then freeing most of them when you receive jumbo > packets, why not just allocate the skbuff part in netif_poll, for each > complete packet that you detect you have received, rather than doing it in > network_alloc_rx_buffers?The main reason for preallocation is to avoid unnecessary work. When the system is under memory pressure, if we postpone the allocation then all the work done to move the packet from dom0 to the point where we are would be wasted should the allocation fail at that point. On the other, if the failure occurs during pre-allocation, then dom0 would not even get to place the skb into the ring since there would be no room there. I do concede that we''re wasting effort in repeatedly initialising skb''s that we throw away in the case of jumbo packets. We can remove that waste by maintaining our own list of unused skb''s that we can simply put back on the ring in network_alloc_rx_buffers without going through alloc_skb again.> Alternatively we could go back to the old method of rx buffer allocation > which allocates page-sized skbuff header areas, then unmaps the page > containing skb->head. This would avoid the need for RX_COPY_THRESHOLD, but > would constrain the location of the first packet fragment (since the skb > header area needs 16 bytes headroom and tailroom for skb_shared_info) or > we''d need a slow path to relocate the header fragment. What do you think?I''d like avoid that because this relies on the internal structure of sk_buff which may not stay the same as Linux evolves. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/8/06 1:29 am, "Herbert Xu" <herbert@gondor.apana.org.au> wrote:> I do concede that we''re wasting effort in repeatedly initialising skb''s > that we throw away in the case of jumbo packets. We can remove that > waste by maintaining our own list of unused skb''s that we can simply > put back on the ring in network_alloc_rx_buffers without going through > alloc_skb again.If we''re successfully allocating *whole pages* in network_alloc_rx_buffers(), I''d be surprised if we couldn''t allocate a 256-byte skbuff in netif_poll(). On the other hand, just putting the skbuffs on the rx_batch queue is an easier change to make. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel