Hi Keir: This series of patches adds SG and TSO support for the dom0=>domU direction. This completes the domU=>domU loop which means that we have TSO all the way through. It''s a bit late here so I haven''t done a full set of benchmarks. However, these two figures should be indicative for domU=>domU: TSO off: 323.91Mb/s TSO on: 1678.52Mb/s lo(16436): 5533.93Mb/s Of course you''ll need to turn the 2 #if''s back on for domU=>dom0 to do this. I''ve also found out the reason I was getting smaller MSSes before. The default write buffer size is too small for these speeds in Linux. So you need to raise it by something like: echo 526808 > /proc/sys/net/core/wmem_max netperf -t TCP_STREAM -H x.x.x.x -- -s 263404 Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Zero gso features/pad Set unused gso features/pad values to zero for forward-compatibility. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 8e55c5c11475 -r dd443604e775 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Wed Jul 05 18:48:41 2006 +0100 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:35:42 2006 +1000 @@ -788,6 +788,8 @@ static int network_start_xmit(struct sk_ gso->u.gso.size = skb_shinfo(skb)->gso_size; gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; + gso->u.gso.pad = 0; + gso->u.gso.features = 0; gso->type = XEN_NETIF_EXTRA_TYPE_GSO; gso->flags = 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-07 14:17 UTC
[Xen-devel] [2/11] [NET] back: Make use of the simplicity of tasklets in net_rx_action
Hi: [NET] back: Make use of the simplicity of tasklets in net_rx_action Tasklets have the property that each one is running on only one CPU at any time. This means that you don''t have to worry about the tasklet racing against itself. Therefore any resources used by just a single tasklet does not need to be guarded by locks. Since net_rx_action is the only user of alloc_mfn, we can remove the mfn_lock that guard it. The notify_list array is huge by Linux standards so placing it on the stack is unsafe. Since net_rx_action is not re-entrant, we can simply make it static. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r dd443604e775 -r e90f22eee9a1 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:35:42 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:35:46 2006 +1000 @@ -99,24 +99,21 @@ static spinlock_t net_schedule_list_lock #define MAX_MFN_ALLOC 64 static unsigned long mfn_list[MAX_MFN_ALLOC]; static unsigned int alloc_index = 0; -static DEFINE_SPINLOCK(mfn_lock); static unsigned long alloc_mfn(void) { - unsigned long mfn = 0, flags; + unsigned long mfn = 0; struct xen_memory_reservation reservation = { .nr_extents = MAX_MFN_ALLOC, .extent_order = 0, .domid = DOMID_SELF }; set_xen_guest_handle(reservation.extent_start, mfn_list); - spin_lock_irqsave(&mfn_lock, flags); if ( unlikely(alloc_index == 0) ) alloc_index = HYPERVISOR_memory_op( XENMEM_increase_reservation, &reservation); if ( alloc_index != 0 ) mfn = mfn_list[--alloc_index]; - spin_unlock_irqrestore(&mfn_lock, flags); return mfn; } @@ -222,9 +219,13 @@ static void net_rx_action(unsigned long unsigned long vdata, old_mfn, new_mfn; struct sk_buff_head rxq; struct sk_buff *skb; - u16 notify_list[NET_RX_RING_SIZE]; int notify_nr = 0; int ret; + /* + * Putting hundreds of bytes on the stack is considered rude. + * Static works because a tasklet can only be on one CPU at any time. + */ + static u16 notify_list[NET_RX_RING_SIZE]; skb_queue_head_init(&rxq); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Stop using rx->id With the current protocol for transferring packets from dom0 to domU, the rx->id field is useless because it can be derived from the rx request ring ID. In particular, rx->id = (ring_id & NET_RX_RING_SIZE - 1) + 1; This formula works because the rx response to each request always occupies the same slot that the request arrived in. This in turn is a consequence of the fact that each packet only occupies one slot. The other important reason that this works for dom0=>domU but not domU=>dom0 is that the resource associated with the rx->id is freed immediately while in the domU=>dom0 case the resource is held until the skb is liberated by dom0. Using this formula we can essentially remove rx->id from the protocol, freeing up space that could be instead be used by things like TSO. The only constraint is that the backend must obey the rule that each id must only be used in the response that occupies the same slot as the request. The actual field of rx->id is still maintained for compatibility with older backends. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r e90f22eee9a1 -r 2ec3e15164f7 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:35:46 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:35:52 2006 +1000 @@ -99,17 +99,17 @@ struct netfront_info { struct timer_list rx_refill_timer; /* - * {tx,rx}_skbs store outstanding skbuffs. The first entry in each - * array is an index into a chain of free entries. + * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs + * is an index into a chain of free entries. */ struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1]; - struct sk_buff *rx_skbs[NET_RX_RING_SIZE+1]; + struct sk_buff *rx_skbs[NET_RX_RING_SIZE]; #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256) grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; grant_ref_t gref_rx_head; - grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1]; + grant_ref_t grant_rx_ref[NET_TX_RING_SIZE]; struct xenbus_device *xbdev; int tx_ring_ref; @@ -122,7 +122,7 @@ struct netfront_info { }; /* - * Access macros for acquiring freeing slots in {tx,rx}_skbs[]. + * Access macros for acquiring freeing slots in tx_skbs[]. */ static inline void add_id_to_freelist(struct sk_buff **list, unsigned short id) @@ -136,6 +136,29 @@ static inline unsigned short get_id_from unsigned int id = (unsigned int)(unsigned long)list[0]; list[0] = list[id]; return id; +} + +static inline int xennet_rxidx(RING_IDX idx) +{ + return idx & (NET_RX_RING_SIZE - 1); +} + +static inline struct sk_buff *xennet_get_rx_skb(struct netfront_info *np, + RING_IDX ri) +{ + int i = xennet_rxidx(ri); + struct sk_buff *skb = np->rx_skbs[i]; + np->rx_skbs[i] = NULL; + return skb; +} + +static inline grant_ref_t xennet_get_rx_ref(struct netfront_info *np, + RING_IDX ri) +{ + int i = xennet_rxidx(ri); + grant_ref_t ref = np->grant_rx_ref[i]; + np->grant_rx_ref[i] = GRANT_INVALID_REF; + return ref; } #define DPRINTK(fmt, args...) \ @@ -598,8 +621,9 @@ static void network_alloc_rx_buffers(str skb->dev = dev; - id = get_id_from_freelist(np->rx_skbs); - + id = xennet_rxidx(req_prod + i); + + BUG_ON(np->rx_skbs[id]); np->rx_skbs[id] = skb; RING_GET_REQUEST(&np->rx, req_prod + i)->id = id; @@ -840,6 +864,19 @@ static irqreturn_t netif_int(int irq, vo return IRQ_HANDLED; } +static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb, + grant_ref_t ref) +{ + int new = xennet_rxidx(np->rx.req_prod_pvt); + + BUG_ON(np->rx_skbs[new]); + np->rx_skbs[new] = skb; + np->grant_rx_ref[new] = ref; + RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new; + RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref; + np->rx.req_prod_pvt++; + RING_PUSH_REQUESTS(&np->rx); +} static int netif_poll(struct net_device *dev, int *pbudget) { @@ -874,12 +911,15 @@ static int netif_poll(struct net_device i++, work_done++) { rx = RING_GET_RESPONSE(&np->rx, i); + skb = xennet_get_rx_skb(np, i); + ref = xennet_get_rx_ref(np, i); + /* * This definitely indicates a bug, either in this driver or in * the backend driver. In future this should flag the bad * situation to the system controller to reboot the backed. */ - if ((ref = np->grant_rx_ref[rx->id]) == GRANT_INVALID_REF) { + if (ref == GRANT_INVALID_REF) { WPRINTK("Bad rx response id %d.\n", rx->id); work_done--; continue; @@ -890,21 +930,12 @@ static int netif_poll(struct net_device if (net_ratelimit()) WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n", rx->id, rx->status); - RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id - rx->id; - RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref - ref; - np->rx.req_prod_pvt++; - RING_PUSH_REQUESTS(&np->rx); + xennet_move_rx_slot(np, skb, ref); work_done--; continue; } gnttab_release_grant_reference(&np->gref_rx_head, ref); - np->grant_rx_ref[rx->id] = GRANT_INVALID_REF; - - skb = np->rx_skbs[rx->id]; - add_id_to_freelist(np->rx_skbs, rx->id); /* NB. We handle skb overflow later. */ skb->data = skb->head + rx->offset; @@ -1128,6 +1159,7 @@ static void network_connect(struct net_d struct netfront_info *np = netdev_priv(dev); int i, requeue_idx; struct sk_buff *skb; + grant_ref_t ref; xennet_set_features(dev); @@ -1158,15 +1190,25 @@ static void network_connect(struct net_d } /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ - for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) { - if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET) - continue; + for (i = 0; i < NET_RX_RING_SIZE; i++) { + if (!np->rx_skbs[i]) + break; gnttab_grant_foreign_transfer_ref( np->grant_rx_ref[i], np->xbdev->otherend_id, __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT); - RING_GET_REQUEST(&np->rx, requeue_idx)->gref - np->grant_rx_ref[i]; - RING_GET_REQUEST(&np->rx, requeue_idx)->id = i; + RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i]; + RING_GET_REQUEST(&np->rx, i)->id = i; + } + for (requeue_idx = i++; i < NET_RX_RING_SIZE; i++) { + if (!np->rx_skbs[i]) + continue; + skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i); + ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); + gnttab_grant_foreign_transfer_ref( + ref, np->xbdev->otherend_id, + __pa(skb->data) >> PAGE_SHIFT); + RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref; + RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx; requeue_idx++; } @@ -1391,11 +1433,6 @@ static struct net_device * __devinit cre np->grant_tx_ref[i] = GRANT_INVALID_REF; } - for (i = 0; i <= NET_RX_RING_SIZE; i++) { - np->rx_skbs[i] = (void *)((unsigned long) i+1); - np->grant_rx_ref[i] = GRANT_INVALID_REF; - } - /* A grant for every tx ring slot */ if (gnttab_alloc_grant_references(TX_MAX_TARGET, &np->gref_tx_head) < 0) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-07 14:18 UTC
[Xen-devel] [4/11] [NET] front: Move rx req pushing to one spot
Hi: [NET] front: Move rx req pushing to one spot This patch moves all rx request pushing to network_alloc_rx_buffers. This is needed to reduce churn for TSO. More importantly, this makes it easier to send notifications when adding rx requests which is required for having a queue in dom0. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 2ec3e15164f7 -r d398fb29e84a linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:35:52 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:36:12 2006 +1000 @@ -600,14 +600,17 @@ static void network_alloc_rx_buffers(str /* Could not allocate any skbuffs. Try again later. */ mod_timer(&np->rx_refill_timer, jiffies + (HZ/10)); - return; + break; } __skb_queue_tail(&np->rx_batch, skb); } /* Is the batch large enough to be worthwhile? */ - if (i < (np->rx_target/2)) + if (i < (np->rx_target/2)) { + if (req_prod > np->rx.sring->req_prod) + goto push; return; + } /* Adjust our fill target if we risked running out of buffers. */ if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4)) && @@ -678,6 +681,7 @@ static void network_alloc_rx_buffers(str /* Above is a suitable barrier to ensure backend will see requests. */ np->rx.req_prod_pvt = req_prod + i; +push: RING_PUSH_REQUESTS(&np->rx); } @@ -875,7 +879,6 @@ static void xennet_move_rx_slot(struct n RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new; RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref; np->rx.req_prod_pvt++; - RING_PUSH_REQUESTS(&np->rx); } static int netif_poll(struct net_device *dev, int *pbudget) @@ -1213,7 +1216,6 @@ static void network_connect(struct net_d } np->rx.req_prod_pvt = requeue_idx; - RING_PUSH_REQUESTS(&np->rx); /* * Step 3: All public and private state should now be sane. Get _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-07 14:19 UTC
[Xen-devel] [5/11] [NET] back: Replace netif->active with netif_carrier_ok
Hi: [NET] back: Replace netif->active with netif_carrier_ok The connection status to the frontend can be represented using netif_carrier_ok instead of netif->active. As a result, we delay the construction of the dev qdisc until the carrier comes on. This is a prerequisite for adding a tx queue. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r d398fb29e84a -r 9fa8689fc0e9 linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 07 23:36:12 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 07 23:36:16 2006 +1000 @@ -87,7 +87,7 @@ typedef struct netif_st { /* Miscellaneous private stuff. */ enum { DISCONNECTED, DISCONNECTING, CONNECTED } status; - int active; + struct list_head list; /* scheduling list */ atomic_t refcnt; struct net_device *dev; diff -r d398fb29e84a -r 9fa8689fc0e9 linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:36:12 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:36:16 2006 +1000 @@ -37,9 +37,7 @@ static void __netif_up(netif_t *netif) static void __netif_up(netif_t *netif) { struct net_device *dev = netif->dev; - netif_tx_lock_bh(dev); - netif->active = 1; - netif_tx_unlock_bh(dev); + netif_carrier_on(dev); enable_irq(netif->irq); netif_schedule_work(netif); } @@ -49,7 +47,7 @@ static void __netif_down(netif_t *netif) struct net_device *dev = netif->dev; disable_irq(netif->irq); netif_tx_lock_bh(dev); - netif->active = 0; + netif_carrier_off(dev); netif_tx_unlock_bh(dev); netif_deschedule_work(netif); } @@ -91,6 +89,8 @@ netif_t *netif_alloc(domid_t domid, unsi DPRINTK("Could not create netif: out of memory\n"); return ERR_PTR(-ENOMEM); } + + netif_carrier_off(dev); netif = netdev_priv(dev); memset(netif, 0, sizeof(*netif)); diff -r d398fb29e84a -r 9fa8689fc0e9 linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:36:12 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:36:16 2006 +1000 @@ -143,7 +143,7 @@ int netif_be_start_xmit(struct sk_buff * BUG_ON(skb->dev != dev); /* Drop the packet if the target domain has no receive buffers. */ - if (!netif->active || + if (unlikely(!netif_carrier_ok(dev)) || (netif->rx_req_cons_peek == netif->rx.sring->req_prod) || ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) = NET_RX_RING_SIZE)) @@ -404,7 +404,8 @@ static void add_to_net_schedule_list_tai return; spin_lock_irq(&net_schedule_list_lock); - if (!__on_net_schedule_list(netif) && netif->active) { + if (!__on_net_schedule_list(netif) && + likely(netif_carrier_ok(netif->dev))) { list_add_tail(&netif->list, &net_schedule_list); netif_get(netif); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Added feature-rx-notify This patch adds support to the frontend for notifying the backend whenever the rx ring is refilled. This is required in order for the backend to get a tx queue. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 9fa8689fc0e9 -r 2ac3f7b0e0fc linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:36:16 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:36:20 2006 +1000 @@ -326,6 +326,12 @@ again: goto abort_transaction; } + err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify", "%d", 1); + if (err) { + message = "writing feature-rx-notify"; + goto abort_transaction; + } + err = xenbus_transaction_end(xbt, 0); if (err) { if (err == -EAGAIN) @@ -683,6 +689,7 @@ static void network_alloc_rx_buffers(str np->rx.req_prod_pvt = req_prod + i; push: RING_PUSH_REQUESTS(&np->rx); + notify_remote_via_irq(np->irq); } static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, @@ -1224,7 +1231,6 @@ static void network_connect(struct net_d * packets. */ netif_carrier_on(dev); - notify_remote_via_irq(np->irq); network_tx_buf_gc(dev); network_alloc_rx_buffers(dev); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] back: Added tx queue This patch adds a tx queue to the backend if the frontend supports rx refill notification. A queue is needed because SG/TSO greatly reduces the number of packets that can be stored in the rx ring. Given an rx ring with 256 entries, a maximum TSO packet can occupy as many as 18 entries, meaning that the entire ring can only hold 14 packets. This is too small at high bandwidths with large TCP RX windows. Having a tx queue does not present a new security risk as the queue is a fixed size buffer just like rx ring. So each guest can only hold a fixed amount of memory (proportional to the tx queue length) on the host. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 2ac3f7b0e0fc -r c754083146bc linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 07 23:36:20 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 07 23:36:58 2006 +1000 @@ -76,6 +76,9 @@ typedef struct netif_st { struct vm_struct *tx_comms_area; struct vm_struct *rx_comms_area; + /* Set of features that can be turned on in dev->features. */ + int features; + /* Allow netif_be_start_xmit() to peek ahead in the rx request ring. */ RING_IDX rx_req_cons_peek; @@ -121,4 +124,10 @@ struct net_device_stats *netif_be_get_st struct net_device_stats *netif_be_get_stats(struct net_device *dev); irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs); +static inline int netbk_can_queue(struct net_device *dev) +{ + netif_t *netif = netdev_priv(dev); + return netif->features & NETIF_F_LLTX; +} + #endif /* __NETIF__BACKEND__COMMON_H__ */ diff -r 2ac3f7b0e0fc -r c754083146bc linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:36:20 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:36:58 2006 +1000 @@ -57,14 +57,12 @@ static int net_open(struct net_device *d netif_t *netif = netdev_priv(dev); if (netif->status == CONNECTED) __netif_up(netif); - netif_start_queue(dev); return 0; } static int net_close(struct net_device *dev) { netif_t *netif = netdev_priv(dev); - netif_stop_queue(dev); if (netif->status == CONNECTED) __netif_down(netif); return 0; @@ -112,9 +110,6 @@ netif_t *netif_alloc(domid_t domid, unsi dev->features = NETIF_F_IP_CSUM; SET_ETHTOOL_OPS(dev, &network_ethtool_ops); - - /* Disable queuing. */ - dev->tx_queue_len = 0; for (i = 0; i < ETH_ALEN; i++) if (be_mac[i] != 0) diff -r 2ac3f7b0e0fc -r c754083146bc linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:36:20 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:36:58 2006 +1000 @@ -136,6 +136,14 @@ static inline int is_xen_skb(struct sk_b return (cp == skbuff_cachep); } +static inline int netbk_queue_full(netif_t *netif) +{ + RING_IDX peek = netif->rx_req_cons_peek; + + return netif->rx.sring->req_prod - peek <= 0 || + netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0; +} + int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) { netif_t *netif = netdev_priv(dev); @@ -143,11 +151,13 @@ int netif_be_start_xmit(struct sk_buff * BUG_ON(skb->dev != dev); /* Drop the packet if the target domain has no receive buffers. */ - if (unlikely(!netif_carrier_ok(dev)) || - (netif->rx_req_cons_peek == netif->rx.sring->req_prod) || - ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) =- NET_RX_RING_SIZE)) + if (unlikely(!netif_carrier_ok(dev))) goto drop; + + if (unlikely(netbk_queue_full(netif))) { + BUG_ON(netbk_can_queue(dev)); + goto drop; + } /* * We do not copy the packet unless: @@ -178,6 +188,9 @@ int netif_be_start_xmit(struct sk_buff * netif->rx_req_cons_peek++; netif_get(netif); + if (netbk_can_queue(dev) && netbk_queue_full(netif)) + netif_stop_queue(dev); + skb_queue_tail(&rx_queue, skb); tasklet_schedule(&net_rx_tasklet); @@ -351,6 +364,10 @@ static void net_rx_action(unsigned long notify_list[notify_nr++] = irq; } + if (netif_queue_stopped(netif->dev) && + !netbk_queue_full(netif)) + netif_wake_queue(netif->dev); + netif_put(netif); dev_kfree_skb(skb); gop++; @@ -973,8 +990,13 @@ irqreturn_t netif_be_int(int irq, void * irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs) { netif_t *netif = dev_id; + add_to_net_schedule_list_tail(netif); maybe_schedule_tx_action(); + + if (netif_queue_stopped(netif->dev) && !netbk_queue_full(netif)) + netif_wake_queue(netif->dev); + return IRQ_HANDLED; } diff -r 2ac3f7b0e0fc -r c754083146bc linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 07 23:36:20 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 07 23:36:58 2006 +1000 @@ -353,6 +353,7 @@ static int connect_rings(struct backend_ unsigned long tx_ring_ref, rx_ring_ref; unsigned int evtchn; int err; + int val; DPRINTK(""); @@ -366,6 +367,15 @@ static int connect_rings(struct backend_ dev->otherend); return err; } + + if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-notify", "%d", + &val) < 0) + val = 0; + if (val) + be->netif->features |= NETIF_F_LLTX; + else + /* Must be non-zero for pfifo_fast to work. */ + be->netif->dev->tx_queue_len = 1; /* Map the shared frame, irq etc. */ err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Add SG support This patch adds scatter-and-gather support to the frontend. It also advertises this fact through xenbus so that the backend can detect this and send through SG requests only if it is supported. SG support is required to support skb''s larger than one page. This in turn is needed for either jumbo MTU or TSO. One of these is required to bring local networking performance up to a level that is acceptable. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r c754083146bc -r c82398c08669 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:36:58 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:37:45 2006 +1000 @@ -46,11 +46,11 @@ #include <linux/ethtool.h> #include <linux/in.h> #include <linux/if_ether.h> +#include <linux/io.h> #include <net/sock.h> #include <net/pkt_sched.h> #include <net/arp.h> #include <net/route.h> -#include <asm/io.h> #include <asm/uaccess.h> #include <xen/evtchn.h> #include <xen/xenbus.h> @@ -62,17 +62,12 @@ #include <xen/interface/grant_table.h> #include <xen/gnttab.h> +#define RX_COPY_THRESHOLD 256 + #define GRANT_INVALID_REF 0 #define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE) - -static inline void init_skb_shinfo(struct sk_buff *skb) -{ - atomic_set(&(skb_shinfo(skb)->dataref), 1); - skb_shinfo(skb)->nr_frags = 0; - skb_shinfo(skb)->frag_list = NULL; -} struct netfront_info { struct list_head list; @@ -329,6 +324,12 @@ again: err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify", "%d", 1); if (err) { message = "writing feature-rx-notify"; + goto abort_transaction; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", 1); + if (err) { + message = "writing feature-sg"; goto abort_transaction; } @@ -575,10 +576,13 @@ static void network_alloc_rx_buffers(str unsigned short id; struct netfront_info *np = netdev_priv(dev); struct sk_buff *skb; + struct page *page; int i, batch_target; RING_IDX req_prod = np->rx.req_prod_pvt; struct xen_memory_reservation reservation; grant_ref_t ref; + unsigned long pfn; + void *vaddr; if (unlikely(!netif_carrier_ok(dev))) return; @@ -591,15 +595,16 @@ static void network_alloc_rx_buffers(str */ batch_target = np->rx_target - (req_prod - np->rx.rsp_cons); for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) { - /* - * Subtract dev_alloc_skb headroom (16 bytes) and shared info - * tailroom then round down to SKB_DATA_ALIGN boundary. - */ - skb = __dev_alloc_skb( - ((PAGE_SIZE - sizeof(struct skb_shared_info)) & - (-SKB_DATA_ALIGN(1))) - 16, - GFP_ATOMIC|__GFP_NOWARN); - if (skb == NULL) { + /* Allocate an skb and a page. */ + skb = __dev_alloc_skb(RX_COPY_THRESHOLD, + GFP_ATOMIC | __GFP_NOWARN); + if (unlikely(!skb)) + goto no_skb; + + page = alloc_page(GFP_ATOMIC | __GFP_NOWARN); + if (!page) { + kfree_skb(skb); +no_skb: /* Any skbuffs queued for refill? Force them out. */ if (i != 0) goto refill; @@ -608,6 +613,9 @@ static void network_alloc_rx_buffers(str jiffies + (HZ/10)); break; } + + skb_shinfo(skb)->frags[0].page = page; + skb_shinfo(skb)->nr_frags = 1; __skb_queue_tail(&np->rx_batch, skb); } @@ -639,18 +647,20 @@ static void network_alloc_rx_buffers(str ref = gnttab_claim_grant_reference(&np->gref_rx_head); BUG_ON((signed short)ref < 0); np->grant_rx_ref[id] = ref; + + pfn = page_to_pfn(skb_shinfo(skb)->frags[0].page); + vaddr = page_address(skb_shinfo(skb)->frags[0].page); + gnttab_grant_foreign_transfer_ref(ref, - np->xbdev->otherend_id, - __pa(skb->head)>>PAGE_SHIFT); + np->xbdev->otherend_id, pfn); RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref; - np->rx_pfn_array[i] = virt_to_mfn(skb->head); + np->rx_pfn_array[i] = pfn_to_mfn(pfn); if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* Remove this page before passing back to Xen. */ - set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT, - INVALID_P2M_ENTRY); + set_phys_to_machine(pfn, INVALID_P2M_ENTRY); MULTI_update_va_mapping(np->rx_mcl+i, - (unsigned long)skb->head, + (unsigned long)vaddr, __pte(0), 0); } } @@ -888,41 +898,29 @@ static void xennet_move_rx_slot(struct n np->rx.req_prod_pvt++; } -static int netif_poll(struct net_device *dev, int *pbudget) -{ - struct netfront_info *np = netdev_priv(dev); - struct sk_buff *skb, *nskb; - struct netif_rx_response *rx; - RING_IDX i, rp; - struct mmu_update *mmu = np->rx_mmu; - struct multicall_entry *mcl = np->rx_mcl; - int work_done, budget, more_to_do = 1; - struct sk_buff_head rxq; - unsigned long flags; - unsigned long mfn; - grant_ref_t ref; - - spin_lock(&np->rx_lock); - - if (unlikely(!netif_carrier_ok(dev))) { - spin_unlock(&np->rx_lock); - return 0; - } - - skb_queue_head_init(&rxq); - - if ((budget = *pbudget) > dev->quota) - budget = dev->quota; - rp = np->rx.sring->rsp_prod; - rmb(); /* Ensure we see queued responses up to ''rp''. */ - - for (i = np->rx.rsp_cons, work_done = 0; - (i != rp) && (work_done < budget); - i++, work_done++) { - rx = RING_GET_RESPONSE(&np->rx, i); - - skb = xennet_get_rx_skb(np, i); - ref = xennet_get_rx_ref(np, i); +static int xennet_get_responses(struct netfront_info *np, + struct netif_rx_response *rx, RING_IDX rp, + struct sk_buff_head *list, int count) +{ + struct mmu_update *mmu = np->rx_mmu + count; + struct multicall_entry *mcl = np->rx_mcl + count; + RING_IDX cons = np->rx.rsp_cons; + struct sk_buff *skb = xennet_get_rx_skb(np, cons); + grant_ref_t ref = xennet_get_rx_ref(np, cons); + int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); + int frags = 1; + int err = 0; + + for (;;) { + unsigned long mfn; + + if (unlikely(rx->status < 0 || + rx->offset + rx->status > PAGE_SIZE)) { + if (net_ratelimit()) + WPRINTK("rx->offset: %x, size: %u\n", + rx->offset, rx->status); + err = -EINVAL; + } /* * This definitely indicates a bug, either in this driver or in @@ -931,8 +929,8 @@ static int netif_poll(struct net_device */ if (ref == GRANT_INVALID_REF) { WPRINTK("Bad rx response id %d.\n", rx->id); - work_done--; - continue; + err = -EINVAL; + goto next; } /* Memory pressure, insufficient buffer headroom, ... */ @@ -941,16 +939,161 @@ static int netif_poll(struct net_device WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n", rx->id, rx->status); xennet_move_rx_slot(np, skb, ref); + err = -ENOMEM; + goto next; + } + + gnttab_release_grant_reference(&np->gref_rx_head, ref); + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + /* Remap the page. */ + struct page *page = skb_shinfo(skb)->frags[0].page; + unsigned long pfn = page_to_pfn(page); + void *vaddr = page_address(page); + + MULTI_update_va_mapping(mcl, (unsigned long)vaddr, + pfn_pte_ma(mfn, PAGE_KERNEL), + 0); + mcl++; + mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT) + | MMU_MACHPHYS_UPDATE; + mmu->val = pfn; + mmu++; + + set_phys_to_machine(pfn, mfn); + } + + __skb_queue_tail(list, skb); + +next: + if (!(rx->flags & NETRXF_more_data)) + break; + + if (cons + frags == rp) { + if (net_ratelimit()) + WPRINTK("Need more frags\n"); + err = -ENOENT; + break; + } + + rx = RING_GET_RESPONSE(&np->rx, cons + frags); + skb = xennet_get_rx_skb(np, cons + frags); + ref = xennet_get_rx_ref(np, cons + frags); + frags++; + } + + if (unlikely(frags > max)) { + if (net_ratelimit()) + WPRINTK("Too many frags\n"); + err = -E2BIG; + } + + return err; +} + +static RING_IDX xennet_fill_frags(struct netfront_info *np, + struct sk_buff *skb, + struct sk_buff_head *list) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + int nr_frags = shinfo->nr_frags; + RING_IDX cons = np->rx.rsp_cons; + skb_frag_t *frag = shinfo->frags + nr_frags; + struct sk_buff *nskb; + + while ((nskb = __skb_dequeue(list))) { + struct netif_rx_response *rx + RING_GET_RESPONSE(&np->rx, ++cons); + + frag->page = skb_shinfo(nskb)->frags[0].page; + frag->page_offset = rx->offset; + frag->size = rx->status; + + skb->data_len += rx->status; + + skb_shinfo(nskb)->nr_frags = 0; + kfree_skb(nskb); + + frag++; + nr_frags++; + } + + shinfo->nr_frags = nr_frags; + return cons; +} + +static int netif_poll(struct net_device *dev, int *pbudget) +{ + struct netfront_info *np = netdev_priv(dev); + struct sk_buff *skb; + struct netif_rx_response *rx; + RING_IDX i, rp; + struct multicall_entry *mcl; + int work_done, budget, more_to_do = 1; + struct sk_buff_head rxq; + struct sk_buff_head errq; + struct sk_buff_head tmpq; + unsigned long flags; + unsigned int len; + int pages_done; + int err; + + spin_lock(&np->rx_lock); + + if (unlikely(!netif_carrier_ok(dev))) { + spin_unlock(&np->rx_lock); + return 0; + } + + skb_queue_head_init(&rxq); + skb_queue_head_init(&errq); + skb_queue_head_init(&tmpq); + + if ((budget = *pbudget) > dev->quota) + budget = dev->quota; + rp = np->rx.sring->rsp_prod; + rmb(); /* Ensure we see queued responses up to ''rp''. */ + + for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0; + (i != rp) && (work_done < budget); + np->rx.rsp_cons = ++i, work_done++) { + rx = RING_GET_RESPONSE(&np->rx, i); + + err = xennet_get_responses(np, rx, rp, &tmpq, pages_done); + pages_done += skb_queue_len(&tmpq); + + if (unlikely(err)) { + i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1; work_done--; + while ((skb = __skb_dequeue(&tmpq))) + __skb_queue_tail(&errq, skb); + np->stats.rx_errors++; continue; } - gnttab_release_grant_reference(&np->gref_rx_head, ref); - - /* NB. We handle skb overflow later. */ - skb->data = skb->head + rx->offset; - skb->len = rx->status; - skb->tail = skb->data + skb->len; + skb = __skb_dequeue(&tmpq); + + skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page; + skb->h.raw = skb->nh.raw + rx->offset; + + len = rx->status; + if (len > RX_COPY_THRESHOLD) + len = RX_COPY_THRESHOLD; + skb_put(skb, len); + + if (rx->status > len) { + skb_shinfo(skb)->frags[0].page_offset + rx->offset + len; + skb_shinfo(skb)->frags[0].size = rx->status - len; + skb->data_len = rx->status - len; + } else { + skb_shinfo(skb)->frags[0].page = NULL; + skb_shinfo(skb)->nr_frags = 0; + } + + i = xennet_fill_frags(np, skb, &tmpq); + skb->truesize += skb->data_len; + skb->len += skb->data_len; /* * Old backends do not assert data_validated but we @@ -966,96 +1109,38 @@ static int netif_poll(struct net_device skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank); np->stats.rx_packets++; - np->stats.rx_bytes += rx->status; - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - /* Remap the page. */ - MULTI_update_va_mapping(mcl, (unsigned long)skb->head, - pfn_pte_ma(mfn, PAGE_KERNEL), - 0); - mcl++; - mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT) - | MMU_MACHPHYS_UPDATE; - mmu->val = __pa(skb->head) >> PAGE_SHIFT; - mmu++; - - set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT, - mfn); - } + np->stats.rx_bytes += skb->len; __skb_queue_tail(&rxq, skb); } /* Some pages are no longer absent... */ - balloon_update_driver_allowance(-work_done); + balloon_update_driver_allowance(-pages_done); /* Do all the remapping work, and M2P updates, in one big hypercall. */ - if (likely((mcl - np->rx_mcl) != 0)) { + if (likely(pages_done)) { + mcl = np->rx_mcl + pages_done; mcl->op = __HYPERVISOR_mmu_update; mcl->args[0] = (unsigned long)np->rx_mmu; - mcl->args[1] = mmu - np->rx_mmu; + mcl->args[1] = pages_done; mcl->args[2] = 0; mcl->args[3] = DOMID_SELF; - mcl++; - (void)HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl); - } + (void)HYPERVISOR_multicall(np->rx_mcl, pages_done + 1); + } + + while ((skb = __skb_dequeue(&errq))) + kfree_skb(skb); while ((skb = __skb_dequeue(&rxq)) != NULL) { - if (skb->len > (dev->mtu + ETH_HLEN + 4)) { - if (net_ratelimit()) - printk(KERN_INFO "Received packet too big for " - "MTU (%d > %d)\n", - skb->len - ETH_HLEN - 4, dev->mtu); - skb->len = 0; - skb->tail = skb->data; - init_skb_shinfo(skb); - dev_kfree_skb(skb); - continue; - } - - /* - * Enough room in skbuff for the data we were passed? Also, - * Linux expects at least 16 bytes headroom in each rx buffer. - */ - if (unlikely(skb->tail > skb->end) || - unlikely((skb->data - skb->head) < 16)) { - if (net_ratelimit()) { - if (skb->tail > skb->end) - printk(KERN_INFO "Received packet " - "is %zd bytes beyond tail.\n", - skb->tail - skb->end); - else - printk(KERN_INFO "Received packet " - "is %zd bytes before head.\n", - 16 - (skb->data - skb->head)); - } - - nskb = __dev_alloc_skb(skb->len + 2, - GFP_ATOMIC|__GFP_NOWARN); - if (nskb != NULL) { - skb_reserve(nskb, 2); - skb_put(nskb, skb->len); - memcpy(nskb->data, skb->data, skb->len); - /* Copy any other fields we already set up. */ - nskb->dev = skb->dev; - nskb->ip_summed = skb->ip_summed; - nskb->proto_data_valid = skb->proto_data_valid; - nskb->proto_csum_blank = skb->proto_csum_blank; - } - - /* Reinitialise and then destroy the old skbuff. */ - skb->len = 0; - skb->tail = skb->data; - init_skb_shinfo(skb); - dev_kfree_skb(skb); - - /* Switch old for new, if we copied the buffer. */ - if ((skb = nskb) == NULL) - continue; - } - - /* Set the shinfo area, which is hidden behind the data. */ - init_skb_shinfo(skb); + struct page *page = (struct page *)skb->nh.raw; + void *vaddr = page_address(page); + + memcpy(skb->data, vaddr + (skb->h.raw - skb->nh.raw), + skb_headlen(skb)); + + if (page != skb_shinfo(skb)->frags[0].page) + __free_page(page); + /* Ethernet work: Delayed to here as it peeks the header. */ skb->protocol = eth_type_trans(skb, dev); @@ -1063,8 +1148,6 @@ static int netif_poll(struct net_device netif_receive_skb(skb); dev->last_rx = jiffies; } - - np->rx.rsp_cons = i; /* If we get a callback with very few responses, reduce fill target. */ /* NB. Note exponential increase, linear decrease. */ @@ -1205,7 +1288,7 @@ static void network_connect(struct net_d break; gnttab_grant_foreign_transfer_ref( np->grant_rx_ref[i], np->xbdev->otherend_id, - __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT); + page_to_pfn(skb_shinfo(np->rx_skbs[i])->frags->page)); RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i]; RING_GET_REQUEST(&np->rx, i)->id = i; } @@ -1216,7 +1299,7 @@ static void network_connect(struct net_d ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); gnttab_grant_foreign_transfer_ref( ref, np->xbdev->otherend_id, - __pa(skb->data) >> PAGE_SHIFT); + page_to_pfn(skb_shinfo(skb)->frags->page)); RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref; RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx; requeue_idx++; diff -r c754083146bc -r c82398c08669 xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Fri Jul 07 23:36:58 2006 +1000 +++ b/xen/include/public/io/netif.h Fri Jul 07 23:37:45 2006 +1000 @@ -123,6 +123,10 @@ typedef struct netif_rx_request netif_rx #define _NETRXF_csum_blank (1) #define NETRXF_csum_blank (1U<<_NETRXF_csum_blank) +/* Packet continues in the next request descriptor. */ +#define _NETRXF_more_data (2) +#define NETRXF_more_data (1U<<_NETRXF_more_data) + struct netif_rx_response { uint16_t id; uint16_t offset; /* Offset in page of start of received packet */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-07 14:20 UTC
[Xen-devel] [9/11] [NET] back: Transmit SG packets if supported
Hi: [NET] back: Transmit SG packets if supported This patch adds scatter-and-gather transmission support to the backend. This allows the MTU to be raised right now and the potential for TSO in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r c82398c08669 -r 54f84d198a1d linux-2.6-xen-sparse/drivers/xen/netback/common.h --- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 07 23:37:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Fri Jul 07 23:38:49 2006 +1000 @@ -130,4 +130,10 @@ static inline int netbk_can_queue(struct return netif->features & NETIF_F_LLTX; } +static inline int netbk_can_sg(struct net_device *dev) +{ + netif_t *netif = netdev_priv(dev); + return netif->features & NETIF_F_SG; +} + #endif /* __NETIF__BACKEND__COMMON_H__ */ diff -r c82398c08669 -r 54f84d198a1d linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:37:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:38:49 2006 +1000 @@ -68,10 +68,34 @@ static int net_close(struct net_device * return 0; } +static int netbk_change_mtu(struct net_device *dev, int mtu) +{ + int max = netbk_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN; + + if (mtu > max) + return -EINVAL; + dev->mtu = mtu; + return 0; +} + +static int netbk_set_sg(struct net_device *dev, u32 data) +{ + if (data) { + netif_t *netif = netdev_priv(dev); + + if (!(netif->features & NETIF_F_SG)) + return -ENOSYS; + } + + return ethtool_op_set_sg(dev, data); +} + static struct ethtool_ops network_ethtool_ops { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = ethtool_op_set_tx_csum, + .get_sg = ethtool_op_get_sg, + .set_sg = netbk_set_sg, }; netif_t *netif_alloc(domid_t domid, unsigned int handle, u8 be_mac[ETH_ALEN]) @@ -107,6 +131,7 @@ netif_t *netif_alloc(domid_t domid, unsi dev->get_stats = netif_be_get_stats; dev->open = net_open; dev->stop = net_close; + dev->change_mtu = netbk_change_mtu; dev->features = NETIF_F_IP_CSUM; SET_ETHTOOL_OPS(dev, &network_ethtool_ops); diff -r c82398c08669 -r 54f84d198a1d linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:37:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:38:49 2006 +1000 @@ -40,6 +40,11 @@ /*#define NETBE_DEBUG_INTERRUPT*/ +struct netbk_rx_meta { + skb_frag_t frag; + int id; +}; + static void netif_idx_release(u16 pending_idx); static void netif_page_release(struct page *page); static void make_tx_response(netif_t *netif, @@ -100,21 +105,27 @@ static unsigned long mfn_list[MAX_MFN_AL static unsigned long mfn_list[MAX_MFN_ALLOC]; static unsigned int alloc_index = 0; -static unsigned long alloc_mfn(void) -{ - unsigned long mfn = 0; +static inline unsigned long alloc_mfn(void) +{ + return mfn_list[--alloc_index]; +} + +static int check_mfn(int nr) +{ struct xen_memory_reservation reservation = { - .nr_extents = MAX_MFN_ALLOC, .extent_order = 0, .domid = DOMID_SELF }; - set_xen_guest_handle(reservation.extent_start, mfn_list); - if ( unlikely(alloc_index == 0) ) - alloc_index = HYPERVISOR_memory_op( - XENMEM_increase_reservation, &reservation); - if ( alloc_index != 0 ) - mfn = mfn_list[--alloc_index]; - return mfn; + + if (likely(alloc_index >= nr)) + return 0; + + set_xen_guest_handle(reservation.extent_start, mfn_list + alloc_index); + reservation.nr_extents = MAX_MFN_ALLOC - alloc_index; + alloc_index += HYPERVISOR_memory_op(XENMEM_increase_reservation, + &reservation); + + return alloc_index >= nr ? 0 : -ENOMEM; } static inline void maybe_schedule_tx_action(void) @@ -136,12 +147,87 @@ static inline int is_xen_skb(struct sk_b return (cp == skbuff_cachep); } +static struct sk_buff *netbk_copy_skb(struct sk_buff *skb) +{ + struct skb_shared_info *ninfo; + struct sk_buff *nskb; + unsigned long offset; + int ret; + int len; + int headlen; + + nskb = alloc_skb(SKB_MAX_HEAD(0), GFP_ATOMIC); + if (unlikely(!nskb)) + goto err; + + skb_reserve(nskb, 16); + headlen = nskb->end - nskb->data; + if (headlen > skb_headlen(skb)) + headlen = skb_headlen(skb); + ret = skb_copy_bits(skb, 0, __skb_put(nskb, headlen), headlen); + BUG_ON(ret); + + ninfo = skb_shinfo(nskb); + ninfo->gso_size = skb_shinfo(skb)->gso_size; + ninfo->gso_type = skb_shinfo(skb)->gso_type; + + offset = headlen; + len = skb->len - headlen; + + nskb->len = skb->len; + nskb->data_len = len; + nskb->truesize += len; + + while (len) { + struct page *page; + int copy; + int zero; + + if (unlikely(ninfo->nr_frags >= MAX_SKB_FRAGS)) { + dump_stack(); + goto err_free; + } + + copy = len >= PAGE_SIZE ? PAGE_SIZE : len; + zero = len >= PAGE_SIZE ? 0 : __GFP_ZERO; + + page = alloc_page(GFP_ATOMIC | zero); + if (unlikely(!page)) + goto err_free; + + ret = skb_copy_bits(skb, offset, page_address(page), copy); + BUG_ON(ret); + + ninfo->frags[ninfo->nr_frags].page = page; + ninfo->frags[ninfo->nr_frags].page_offset = 0; + ninfo->frags[ninfo->nr_frags].size = copy; + ninfo->nr_frags++; + + offset += copy; + len -= copy; + } + + offset = nskb->data - skb->data; + + nskb->h.raw = skb->h.raw + offset; + nskb->nh.raw = skb->nh.raw + offset; + nskb->mac.raw = skb->mac.raw + offset; + + return nskb; + +err_free: + kfree_skb(nskb); +err: + return NULL; +} + static inline int netbk_queue_full(netif_t *netif) { RING_IDX peek = netif->rx_req_cons_peek; - return netif->rx.sring->req_prod - peek <= 0 || - netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0; + return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS || + netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <+ MAX_SKB_FRAGS; } int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -163,20 +249,12 @@ int netif_be_start_xmit(struct sk_buff * * We do not copy the packet unless: * 1. The data is shared; or * 2. The data is not allocated from our special cache. - * NB. We also couldn''t cope with fragmented packets, but we won''t get - * any because we not advertise the NETIF_F_SG feature. + * 3. The data is fragmented. */ - if (skb_shared(skb) || skb_cloned(skb) || !is_xen_skb(skb)) { - int hlen = skb->data - skb->head; - int ret; - struct sk_buff *nskb = dev_alloc_skb(hlen + skb->len); + if (skb_cloned(skb) || skb_is_nonlinear(skb) || !is_xen_skb(skb)) { + struct sk_buff *nskb = netbk_copy_skb(skb); if ( unlikely(nskb == NULL) ) goto drop; - skb_reserve(nskb, hlen); - __skb_put(nskb, skb->len); - ret = skb_copy_bits(skb, -hlen, nskb->data - hlen, - skb->len + hlen); - BUG_ON(ret); /* Copy only the header fields we use in this driver. */ nskb->dev = skb->dev; nskb->ip_summed = skb->ip_summed; @@ -185,7 +263,7 @@ int netif_be_start_xmit(struct sk_buff * skb = nskb; } - netif->rx_req_cons_peek++; + netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1; netif_get(netif); if (netbk_can_queue(dev) && netbk_queue_full(netif)) @@ -221,116 +299,80 @@ int xen_network_done(void) } #endif -static void net_rx_action(unsigned long unused) -{ - netif_t *netif = NULL; - s8 status; - u16 size, id, irq, flags; - multicall_entry_t *mcl; - mmu_update_t *mmu; - gnttab_transfer_t *gop; - unsigned long vdata, old_mfn, new_mfn; - struct sk_buff_head rxq; - struct sk_buff *skb; - int notify_nr = 0; - int ret; +static u16 netbk_gop_frag(netif_t *netif, struct page *page, int count, int i) +{ + multicall_entry_t *mcl = rx_mcl + count; + mmu_update_t *mmu = rx_mmu + count; + gnttab_transfer_t *gop = grant_rx_op + count; + netif_rx_request_t *req; + unsigned long old_mfn, new_mfn; + + old_mfn = virt_to_mfn(page_address(page)); + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + new_mfn = alloc_mfn(); + + /* + * Set the new P2M table entry before reassigning + * the old data page. Heed the comment in + * pgtable-2level.h:pte_page(). :-) + */ + set_phys_to_machine(page_to_pfn(page), new_mfn); + + MULTI_update_va_mapping(mcl, (unsigned long)page_address(page), + pfn_pte_ma(new_mfn, PAGE_KERNEL), 0); + + mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) | + MMU_MACHPHYS_UPDATE; + mmu->val = page_to_pfn(page); + } + + req = RING_GET_REQUEST(&netif->rx, netif->rx.req_cons + i); + gop->mfn = old_mfn; + gop->domid = netif->domid; + gop->ref = req->gref; + return req->id; +} + +static void netbk_gop_skb(struct sk_buff *skb, struct netbk_rx_meta *meta, + int count) +{ + netif_t *netif = netdev_priv(skb->dev); + int nr_frags = skb_shinfo(skb)->nr_frags; + int i; + + for (i = 0; i < nr_frags; i++) { + meta[++count].frag = skb_shinfo(skb)->frags[i]; + meta[count].id = netbk_gop_frag(netif, meta[count].frag.page, + count, i + 1); + } + /* - * Putting hundreds of bytes on the stack is considered rude. - * Static works because a tasklet can only be on one CPU at any time. + * This must occur at the end to ensure that we don''t trash + * skb_shinfo until we''re done. */ - static u16 notify_list[NET_RX_RING_SIZE]; - - skb_queue_head_init(&rxq); - - mcl = rx_mcl; - mmu = rx_mmu; - gop = grant_rx_op; - - while ((skb = skb_dequeue(&rx_queue)) != NULL) { - netif = netdev_priv(skb->dev); - vdata = (unsigned long)skb->data; - old_mfn = virt_to_mfn(vdata); - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - /* Memory squeeze? Back off for an arbitrary while. */ - if ((new_mfn = alloc_mfn()) == 0) { - if ( net_ratelimit() ) - WPRINTK("Memory squeeze in netback " - "driver.\n"); - mod_timer(&net_timer, jiffies + HZ); - skb_queue_head(&rx_queue, skb); - break; - } - /* - * Set the new P2M table entry before reassigning - * the old data page. Heed the comment in - * pgtable-2level.h:pte_page(). :-) - */ - set_phys_to_machine( - __pa(skb->data) >> PAGE_SHIFT, - new_mfn); - - MULTI_update_va_mapping(mcl, vdata, - pfn_pte_ma(new_mfn, - PAGE_KERNEL), 0); - mcl++; - - mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) | - MMU_MACHPHYS_UPDATE; - mmu->val = __pa(vdata) >> PAGE_SHIFT; - mmu++; - } - - gop->mfn = old_mfn; - gop->domid = netif->domid; - gop->ref = RING_GET_REQUEST( - &netif->rx, netif->rx.req_cons)->gref; - netif->rx.req_cons++; - gop++; - - __skb_queue_tail(&rxq, skb); - - /* Filled the batch queue? */ - if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op)) - break; - } - - if (!xen_feature(XENFEAT_auto_translated_physmap)) { - if (mcl == rx_mcl) - return; - - mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL; - - if (mmu - rx_mmu) { - mcl->op = __HYPERVISOR_mmu_update; - mcl->args[0] = (unsigned long)rx_mmu; - mcl->args[1] = mmu - rx_mmu; - mcl->args[2] = 0; - mcl->args[3] = DOMID_SELF; - mcl++; - } - - ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl); - BUG_ON(ret != 0); - } - - ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, - gop - grant_rx_op); - BUG_ON(ret != 0); - - mcl = rx_mcl; - gop = grant_rx_op; - while ((skb = __skb_dequeue(&rxq)) != NULL) { - netif = netdev_priv(skb->dev); - size = skb->tail - skb->data; - - atomic_set(&(skb_shinfo(skb)->dataref), 1); - skb_shinfo(skb)->nr_frags = 0; - skb_shinfo(skb)->frag_list = NULL; - - netif->stats.tx_bytes += size; - netif->stats.tx_packets++; - + meta[count - nr_frags].id = netbk_gop_frag(netif, + virt_to_page(skb->data), + count - nr_frags, 0); + netif->rx.req_cons += nr_frags + 1; +} + +static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta) +{ + int i; + + for (i = 0; i < nr_frags; i++) + put_page(meta[i].frag.page); +} + +static int netbk_check_gop(int nr_frags, domid_t domid, int count) +{ + multicall_entry_t *mcl = rx_mcl + count; + gnttab_transfer_t *gop = grant_rx_op + count; + int status = NETIF_RSP_OKAY; + int i; + + for (i = 0; i <= nr_frags; i++) { if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* The update_va_mapping() must not fail. */ BUG_ON(mcl->result != 0); @@ -338,10 +380,9 @@ static void net_rx_action(unsigned long } /* Check the reassignment error code. */ - status = NETIF_RSP_OKAY; if (gop->status != 0) { DPRINTK("Bad status %d from grant transfer to DOM%u\n", - gop->status, netif->domid); + gop->status, domid); /* * Page no longer belongs to us unless GNTST_bad_page, * but that should be a fatal error anyway. @@ -349,17 +390,128 @@ static void net_rx_action(unsigned long BUG_ON(gop->status == GNTST_bad_page); status = NETIF_RSP_ERROR; } - irq = netif->irq; - id = RING_GET_REQUEST(&netif->rx, netif->rx.rsp_prod_pvt)->id; - flags = 0; + gop++; + } + + return status; +} + +static void netbk_add_frag_responses(netif_t *netif, int status, + struct netbk_rx_meta *meta, int nr_frags) +{ + int i; + + for (i = 0; i < nr_frags; i++) { + int id = meta[i].id; + int flags = (i == nr_frags - 1) ? 0 : NETRXF_more_data; + + make_rx_response(netif, id, status, meta[i].frag.page_offset, + meta[i].frag.size, flags); + } +} + +static void net_rx_action(unsigned long unused) +{ + netif_t *netif = NULL; + s8 status; + u16 id, irq, flags; + multicall_entry_t *mcl; + struct sk_buff_head rxq; + struct sk_buff *skb; + int notify_nr = 0; + int ret; + int nr_frags; + int count; + + /* + * Putting hundreds of bytes on the stack is considered rude. + * Static works because a tasklet can only be on one CPU at any time. + */ + static u16 notify_list[NET_RX_RING_SIZE]; + static struct netbk_rx_meta meta[NET_RX_RING_SIZE]; + + skb_queue_head_init(&rxq); + + count = 0; + + while ((skb = skb_dequeue(&rx_queue)) != NULL) { + nr_frags = skb_shinfo(skb)->nr_frags; + *(int *)skb->cb = nr_frags; + + if (!xen_feature(XENFEAT_auto_translated_physmap) && + check_mfn(nr_frags + 1)) { + /* Memory squeeze? Back off for an arbitrary while. */ + if ( net_ratelimit() ) + WPRINTK("Memory squeeze in netback " + "driver.\n"); + mod_timer(&net_timer, jiffies + HZ); + skb_queue_head(&rx_queue, skb); + break; + } + + netbk_gop_skb(skb, meta, count); + + count += nr_frags + 1; + + __skb_queue_tail(&rxq, skb); + + /* Filled the batch queue? */ + if (count + MAX_SKB_FRAGS >= NET_RX_RING_SIZE) + break; + } + + if (!count) + return; + + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + mcl = rx_mcl + count; + + mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL; + + mcl->op = __HYPERVISOR_mmu_update; + mcl->args[0] = (unsigned long)rx_mmu; + mcl->args[1] = count; + mcl->args[2] = 0; + mcl->args[3] = DOMID_SELF; + + ret = HYPERVISOR_multicall(rx_mcl, count + 1); + BUG_ON(ret != 0); + } + + ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, count); + BUG_ON(ret != 0); + + count = 0; + while ((skb = __skb_dequeue(&rxq)) != NULL) { + nr_frags = *(int *)skb->cb; + + atomic_set(&(skb_shinfo(skb)->dataref), 1); + skb_shinfo(skb)->nr_frags = 0; + skb_shinfo(skb)->frag_list = NULL; + + netif = netdev_priv(skb->dev); + netif->stats.tx_bytes += skb->len; + netif->stats.tx_packets++; + + netbk_free_pages(nr_frags, meta + count + 1); + status = netbk_check_gop(nr_frags, netif->domid, count); + + id = meta[count].id; + flags = nr_frags ? NETRXF_more_data : 0; + if (skb->ip_summed == CHECKSUM_HW) /* local packet? */ flags |= NETRXF_csum_blank | NETRXF_data_validated; else if (skb->proto_data_valid) /* remote but checksummed? */ flags |= NETRXF_data_validated; - if (make_rx_response(netif, id, status, - (unsigned long)skb->data & ~PAGE_MASK, - size, flags) && - (rx_notify[irq] == 0)) { + + make_rx_response(netif, id, status, offset_in_page(skb->data), + skb_headlen(skb), flags); + netbk_add_frag_responses(netif, status, meta + count + 1, + nr_frags); + + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, ret); + irq = netif->irq; + if (ret && !rx_notify[irq]) { rx_notify[irq] = 1; notify_list[notify_nr++] = irq; } @@ -370,7 +522,7 @@ static void net_rx_action(unsigned long netif_put(netif); dev_kfree_skb(skb); - gop++; + count += nr_frags + 1; } while (notify_nr != 0) { @@ -1039,7 +1191,6 @@ static int make_rx_response(netif_t *net { RING_IDX i = netif->rx.rsp_prod_pvt; netif_rx_response_t *resp; - int notify; resp = RING_GET_RESPONSE(&netif->rx, i); resp->offset = offset; @@ -1050,9 +1201,8 @@ static int make_rx_response(netif_t *net resp->status = (s16)st; netif->rx.rsp_prod_pvt = ++i; - RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, notify); - - return notify; + + return 0; } #ifdef NETBE_DEBUG_INTERRUPT diff -r c82398c08669 -r 54f84d198a1d linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 07 23:37:45 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 07 23:38:49 2006 +1000 @@ -377,6 +377,13 @@ static int connect_rings(struct backend_ /* Must be non-zero for pfifo_fast to work. */ be->netif->dev->tx_queue_len = 1; + if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg", "%d", &val) < 0) + val = 0; + if (val) { + be->netif->features |= NETIF_F_SG; + be->netif->dev->features |= NETIF_F_SG; + } + /* Map the shared frame, irq etc. */ err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn); if (err) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi: [NET] front: Add TSO support This patch adds TCP Segmentation Offload (TSO) support to the frontend. It also advertises this fact through xenbus so that the frontend can detect this and send through TSO requests only if it is supported. This is done using an extra request slot which is indicated by a flag in the first slot. In future checksum offload can be done in the same way. Even though only TSO is supported for now the code actually supports GSO so it can be applied to any other protocol. The only missing bit is the detection of host support for a specific GSO protocol. Once that is added we can advertise all supported protocols to the guest. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 54f84d198a1d -r 5861968091dd linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c --- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:38:49 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Fri Jul 07 23:38:54 2006 +1000 @@ -116,6 +116,11 @@ struct netfront_info { struct mmu_update rx_mmu[NET_RX_RING_SIZE]; }; +struct netfront_rx_info { + struct netif_rx_response rx; + struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1]; +}; + /* * Access macros for acquiring freeing slots in tx_skbs[]. */ @@ -330,6 +335,12 @@ again: err = xenbus_printf(xbt, dev->nodename, "feature-sg", "%d", 1); if (err) { message = "writing feature-sg"; + goto abort_transaction; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4", "%d", 1); + if (err) { + message = "writing feature-gso-tcpv4"; goto abort_transaction; } @@ -898,18 +909,65 @@ static void xennet_move_rx_slot(struct n np->rx.req_prod_pvt++; } +int xennet_get_extras(struct netfront_info *np, + struct netif_extra_info *extras, RING_IDX rp) + +{ + struct netif_extra_info *extra; + RING_IDX cons = np->rx.rsp_cons; + int err = 0; + + do { + struct sk_buff *skb; + grant_ref_t ref; + + if (unlikely(cons + 1 == rp)) { + if (net_ratelimit()) + WPRINTK("Missing extra info\n"); + err = -EBADR; + break; + } + + extra = (struct netif_extra_info *) + RING_GET_RESPONSE(&np->rx, ++cons); + + if (unlikely(!extra->type || + extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) { + if (net_ratelimit()) + WPRINTK("Invalid extra type: %d\n", + extra->type); + err = -EINVAL; + } else + memcpy(&extras[extra->type - 1], extra, sizeof(*extra)); + + skb = xennet_get_rx_skb(np, cons); + ref = xennet_get_rx_ref(np, cons); + xennet_move_rx_slot(np, skb, ref); + } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE); + + np->rx.rsp_cons = cons; + return err; +} + static int xennet_get_responses(struct netfront_info *np, - struct netif_rx_response *rx, RING_IDX rp, + struct netfront_rx_info *rinfo, RING_IDX rp, struct sk_buff_head *list, int count) { struct mmu_update *mmu = np->rx_mmu + count; struct multicall_entry *mcl = np->rx_mcl + count; + struct netif_rx_response *rx = &rinfo->rx; + struct netif_extra_info *extras = rinfo->extras; RING_IDX cons = np->rx.rsp_cons; struct sk_buff *skb = xennet_get_rx_skb(np, cons); grant_ref_t ref = xennet_get_rx_ref(np, cons); int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); int frags = 1; int err = 0; + + if (rx->flags & NETRXF_extra_info) { + err = xennet_get_extras(np, extras, rp); + cons = np->rx.rsp_cons; + } for (;;) { unsigned long mfn; @@ -1022,11 +1080,38 @@ static RING_IDX xennet_fill_frags(struct return cons; } +static int xennet_set_skb_gso(struct sk_buff *skb, struct netif_extra_info *gso) +{ + if (!gso->u.gso.size) { + if (net_ratelimit()) + WPRINTK("GSO size must not be zero.\n"); + return -EINVAL; + } + + /* Currently only TCPv4 S.O. is supported. */ + if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) { + if (net_ratelimit()) + WPRINTK("Bad GSO type %d.\n", gso->u.gso.type); + return -EINVAL; + } + + skb_shinfo(skb)->gso_size = gso->u.gso.size; + skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; + + /* Header must be checked, and gso_segs computed. */ + skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb)->gso_segs = 0; + + return 0; +} + static int netif_poll(struct net_device *dev, int *pbudget) { struct netfront_info *np = netdev_priv(dev); struct sk_buff *skb; - struct netif_rx_response *rx; + struct netfront_rx_info rinfo; + struct netif_rx_response *rx = &rinfo.rx; + struct netif_extra_info *extras = rinfo.extras; RING_IDX i, rp; struct multicall_entry *mcl; int work_done, budget, more_to_do = 1; @@ -1057,12 +1142,14 @@ static int netif_poll(struct net_device for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0; (i != rp) && (work_done < budget); np->rx.rsp_cons = ++i, work_done++) { - rx = RING_GET_RESPONSE(&np->rx, i); - - err = xennet_get_responses(np, rx, rp, &tmpq, pages_done); + memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx)); + memset(extras, 0, sizeof(extras)); + + err = xennet_get_responses(np, &rinfo, rp, &tmpq, pages_done); pages_done += skb_queue_len(&tmpq); if (unlikely(err)) { +err: i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1; work_done--; while ((skb = __skb_dequeue(&tmpq))) @@ -1072,6 +1159,16 @@ static int netif_poll(struct net_device } skb = __skb_dequeue(&tmpq); + + if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) { + struct netif_extra_info *gso; + gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1]; + + if (unlikely(xennet_set_skb_gso(skb, gso))) { + __skb_queue_head(&tmpq, skb); + goto err; + } + } skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page; skb->h.raw = skb->nh.raw + rx->offset; diff -r 54f84d198a1d -r 5861968091dd xen/include/public/io/netif.h --- a/xen/include/public/io/netif.h Fri Jul 07 23:38:49 2006 +1000 +++ b/xen/include/public/io/netif.h Fri Jul 07 23:38:54 2006 +1000 @@ -127,6 +127,10 @@ typedef struct netif_rx_request netif_rx #define _NETRXF_more_data (2) #define NETRXF_more_data (1U<<_NETRXF_more_data) +/* Packet to be followed by extra descriptor(s). */ +#define _NETRXF_extra_info (3) +#define NETRXF_extra_info (1U<<_NETRXF_extra_info) + struct netif_rx_response { uint16_t id; uint16_t offset; /* Offset in page of start of received packet */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-07 14:22 UTC
[Xen-devel] [11/11] [NET] back: Transmit TSO packets if supported
Hi: [NET] back: Transmit TSO packets if supported This patch adds TSO transmission support to the backend. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff -r 5861968091dd -r 51252bc644da linux-2.6-xen-sparse/drivers/xen/netback/interface.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:38:54 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c Fri Jul 07 23:38:57 2006 +1000 @@ -90,12 +90,26 @@ static int netbk_set_sg(struct net_devic return ethtool_op_set_sg(dev, data); } +static int netbk_set_tso(struct net_device *dev, u32 data) +{ + if (data) { + netif_t *netif = netdev_priv(dev); + + if (!(netif->features & NETIF_F_TSO)) + return -ENOSYS; + } + + return ethtool_op_set_tso(dev, data); +} + static struct ethtool_ops network_ethtool_ops { .get_tx_csum = ethtool_op_get_tx_csum, .set_tx_csum = ethtool_op_set_tx_csum, .get_sg = ethtool_op_get_sg, .set_sg = netbk_set_sg, + .get_tso = ethtool_op_get_tso, + .set_tso = netbk_set_tso, }; netif_t *netif_alloc(domid_t domid, unsigned int handle, u8 be_mac[ETH_ALEN]) diff -r 5861968091dd -r 51252bc644da linux-2.6-xen-sparse/drivers/xen/netback/netback.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:38:54 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Fri Jul 07 23:38:57 2006 +1000 @@ -50,12 +50,12 @@ static void make_tx_response(netif_t *ne static void make_tx_response(netif_t *netif, netif_tx_request_t *txp, s8 st); -static int make_rx_response(netif_t *netif, - u16 id, - s8 st, - u16 offset, - u16 size, - u16 flags); +static netif_rx_response_t *make_rx_response(netif_t *netif, + u16 id, + s8 st, + u16 offset, + u16 size, + u16 flags); static void net_tx_action(unsigned long unused); static DECLARE_TASKLET(net_tx_tasklet, net_tx_action, 0); @@ -225,9 +225,9 @@ static inline int netbk_queue_full(netif { RING_IDX peek = netif->rx_req_cons_peek; - return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS || + return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS + 1 || netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <- MAX_SKB_FRAGS; + MAX_SKB_FRAGS + 1; } int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -263,7 +263,8 @@ int netif_be_start_xmit(struct sk_buff * skb = nskb; } - netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1; + netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1 + + !!skb_shinfo(skb)->gso_size; netif_get(netif); if (netbk_can_queue(dev) && netbk_queue_full(netif)) @@ -340,11 +341,16 @@ static void netbk_gop_skb(struct sk_buff netif_t *netif = netdev_priv(skb->dev); int nr_frags = skb_shinfo(skb)->nr_frags; int i; + int extra; + + meta[count].frag.page_offset = skb_shinfo(skb)->gso_type; + meta[count].frag.size = skb_shinfo(skb)->gso_size; + extra = !!meta[count].frag.size + 1; for (i = 0; i < nr_frags; i++) { meta[++count].frag = skb_shinfo(skb)->frags[i]; meta[count].id = netbk_gop_frag(netif, meta[count].frag.page, - count, i + 1); + count, i + extra); } /* @@ -354,7 +360,7 @@ static void netbk_gop_skb(struct sk_buff meta[count - nr_frags].id = netbk_gop_frag(netif, virt_to_page(skb->data), count - nr_frags, 0); - netif->rx.req_cons += nr_frags + 1; + netif->rx.req_cons += nr_frags + extra; } static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta) @@ -415,6 +421,8 @@ static void net_rx_action(unsigned long netif_t *netif = NULL; s8 status; u16 id, irq, flags; + netif_rx_response_t *resp; + struct netif_extra_info *extra; multicall_entry_t *mcl; struct sk_buff_head rxq; struct sk_buff *skb; @@ -504,8 +512,33 @@ static void net_rx_action(unsigned long else if (skb->proto_data_valid) /* remote but checksummed? */ flags |= NETRXF_data_validated; - make_rx_response(netif, id, status, offset_in_page(skb->data), - skb_headlen(skb), flags); + resp = make_rx_response(netif, id, status, + offset_in_page(skb->data), + skb_headlen(skb), flags); + + extra = NULL; + + if (meta[count].frag.size) { + struct netif_extra_info *gso + (struct netif_extra_info *) + RING_GET_RESPONSE(&netif->rx, + netif->rx.rsp_prod_pvt++); + + if (extra) + extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE; + else + resp->flags |= NETRXF_extra_info; + + gso->u.gso.size = meta[count].frag.size; + gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; + gso->u.gso.pad = 0; + gso->u.gso.features = 0; + + gso->type = XEN_NETIF_EXTRA_TYPE_GSO; + gso->flags = 0; + extra = gso; + } + netbk_add_frag_responses(netif, status, meta + count + 1, nr_frags); @@ -1182,12 +1215,12 @@ static void make_tx_response(netif_t *ne #endif } -static int make_rx_response(netif_t *netif, - u16 id, - s8 st, - u16 offset, - u16 size, - u16 flags) +static netif_rx_response_t *make_rx_response(netif_t *netif, + u16 id, + s8 st, + u16 offset, + u16 size, + u16 flags) { RING_IDX i = netif->rx.rsp_prod_pvt; netif_rx_response_t *resp; @@ -1202,7 +1235,7 @@ static int make_rx_response(netif_t *net netif->rx.rsp_prod_pvt = ++i; - return 0; + return resp; } #ifdef NETBE_DEBUG_INTERRUPT diff -r 5861968091dd -r 51252bc644da linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c --- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 07 23:38:54 2006 +1000 +++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Fri Jul 07 23:38:57 2006 +1000 @@ -384,6 +384,14 @@ static int connect_rings(struct backend_ be->netif->dev->features |= NETIF_F_SG; } + if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4", "%d", + &val) < 0) + val = 0; + if (val) { + be->netif->features |= NETIF_F_TSO; + be->netif->dev->features |= NETIF_F_TSO; + } + /* Map the shared frame, irq etc. */ err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn); if (err) { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Some comments (inline) on this patch. I checked in the trivial first two patches. Thanks, Keir On 7 Jul 2006, at 15:18, Herbert Xu wrote:> /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ > - for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) { > - if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET) > - continue; > + for (i = 0; i < NET_RX_RING_SIZE; i++) { > + if (!np->rx_skbs[i]) > + break; > gnttab_grant_foreign_transfer_ref( > np->grant_rx_ref[i], np->xbdev->otherend_id, > __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT); > - RING_GET_REQUEST(&np->rx, requeue_idx)->gref > - np->grant_rx_ref[i]; > - RING_GET_REQUEST(&np->rx, requeue_idx)->id = i; > + RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i]; > + RING_GET_REQUEST(&np->rx, i)->id = i; > + } > + for (requeue_idx = i++; i < NET_RX_RING_SIZE; i++) { > + if (!np->rx_skbs[i]) > + continue; > + skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i); > + ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); > + gnttab_grant_foreign_transfer_ref( > + ref, np->xbdev->otherend_id, > + __pa(skb->data) >> PAGE_SHIFT); > + RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref; > + RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx; > requeue_idx++; > }Why two loops to fill the receive ring? The header of the second loop with the body of the first loop would seem more correct (use of destructive xennet_get_* functions to read skb/ref info looks incorrect). Have you tested this patch with ''xm save/restore'' on a domU receiving a bulk network transfer? That would be very worthwhile.> @@ -1391,11 +1433,6 @@ static struct net_device * __devinit cre > np->grant_tx_ref[i] = GRANT_INVALID_REF; > } > > - for (i = 0; i <= NET_RX_RING_SIZE; i++) { > - np->rx_skbs[i] = (void *)((unsigned long) i+1); > - np->grant_rx_ref[i] = GRANT_INVALID_REF; > - } > - > /* A grant for every tx ring slot */ > if (gnttab_alloc_grant_references(TX_MAX_TARGET, > &np->gref_tx_head) < 0) {Initialisation still would be nice for sanity. We should initialise rx_skbs[i] to NULL, but apart from that I think we should keep this initialisation loop as-is. Oh, except < NET_RX_RING_SIZE rather than <=. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jul-27 13:11 UTC
[Xen-devel] Re: [5/11] [NET] back: Replace netif->active with netif_carrier_ok
On 7 Jul 2006, at 15:19, Herbert Xu wrote:> [NET] back: Replace netif->active with netif_carrier_ok > > The connection status to the frontend can be represented using > netif_carrier_ok instead of netif->active. As a result, we delay > the construction of the dev qdisc until the carrier comes on. This > is a prerequisite for adding a tx queue. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>netif->active is also conditional on whether or not the interface is open (''running''). So it''s not as good a fit for netif_carrier as the status field you similarly replaced in netfront. Perhaps you could replace the netback status field (CONNECTED/DISCONNECTED) instead? I think the extra value DISCONNECTING is not really necessary any more, so it should map onto the boolean netif_carrier okay. Then ->active could also potentially be removed. It would correspond to netif_running&&netif_carrier_ok. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 27, 2006 at 01:49:51PM +0100, Keir Fraser wrote:> > Some comments (inline) on this patch. I checked in the trivial first > two patches.Thanks Keir.> On 7 Jul 2006, at 15:18, Herbert Xu wrote: > > > /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ > >- for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) { > >- if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET) > >- continue; > >+ for (i = 0; i < NET_RX_RING_SIZE; i++) { > >+ if (!np->rx_skbs[i]) > >+ break; > > gnttab_grant_foreign_transfer_ref( > > np->grant_rx_ref[i], np->xbdev->otherend_id, > > __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT); > >- RING_GET_REQUEST(&np->rx, requeue_idx)->gref > >- np->grant_rx_ref[i]; > >- RING_GET_REQUEST(&np->rx, requeue_idx)->id = i; > >+ RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i]; > >+ RING_GET_REQUEST(&np->rx, i)->id = i; > >+ } > >+ for (requeue_idx = i++; i < NET_RX_RING_SIZE; i++) { > >+ if (!np->rx_skbs[i]) > >+ continue; > >+ skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i); > >+ ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, > >i); > >+ gnttab_grant_foreign_transfer_ref( > >+ ref, np->xbdev->otherend_id, > >+ __pa(skb->data) >> PAGE_SHIFT); > >+ RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref; > >+ RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx; > > requeue_idx++; > > } > > Why two loops to fill the receive ring? The header of the second loop > with the body of the first loop would seem more correct (use ofThe first loop deals with the initial segment of np where rx_skbs isn''t NULL. The reason it''s dealt with separately from the rest is because unlike the second loop we don''t relocate the entry in np->rx_skbs and np->grant_rx_ref. Had we used the body of the second loop for the initial segment we would end up wiping out the values of rx_skbs and grant_rx_ref when we do the assignments skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i); ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); because for the initial segment requeue_idx is always the same as i.> destructive xennet_get_* functions to read skb/ref info looks > incorrect).We''re relocating entries from the back of rx_skbs/grant_rx_ref to the front. After relocating it we need to clear the original position as otherwise we may use the entry incorrectly if the backend gives us a bogus response. By clearing it we ensure that it is caught by the warning "Bad rx response id %d.\n". This is also needed for the sanity checks on rx_skbs before we store a new skb into it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Jul 2006, at 15:19, Herbert Xu wrote:> ET] back: Added tx queue > > This patch adds a tx queue to the backend if the frontend supports rx > refill notification. A queue is needed because SG/TSO greatly reduces > the number of packets that can be stored in the rx ring. Given an rx > ring with 256 entries, a maximum TSO packet can occupy as many as 18 > entries, meaning that the entire ring can only hold 14 packets. This > is too small at high bandwidths with large TCP RX windows. > > Having a tx queue does not present a new security risk as the queue is > a fixed size buffer just like rx ring. So each guest can only hold a > fixed amount of memory (proportional to the tx queue length) on the > host. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>What''s the default queue size? It shouldn''t be too large if each packet can be 64kB! Also, what does NETIF_F_LLTX have to do with whether we have a queue or not -- couldn''t we set that all the time? And you set queue length to 1 in the other case -- what''s wrong with zero? Seems a saner value when there is no queue: or do we need to commit to having *some* queue at register_netdevice() time, and there''s no way to go back from that by the time we find out if netfront supports rx refill notifications (so at that point we cannot change queue len to zero)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 27, 2006 at 02:15:29PM +0100, Keir Fraser wrote:> > What''s the default queue size? It shouldn''t be too large if each packet > can be 64kB!The default is 1000. If 64MB is too much for each domain, then how about 100?> Also, what does NETIF_F_LLTX have to do with whether we have a queue or > not -- couldn''t we set that all the time? And you set queue length to 1Sorry, should''ve add a comment about that. NETIF_F_LLTX is just an unused bit from the features set to indicate the fact that queueing is supported. I suppose I could add a new flag too.> in the other case -- what''s wrong with zero? Seems a saner value when > there is no queue: or do we need to commit to having *some* queue at > register_netdevice() time, and there''s no way to go back from that by > the time we find out if netfront supports rx refill notifications (so > at that point we cannot change queue len to zero)?Exactly. Once we''ve committed to having a queue, setting the queue length to zero will cause all packets to be dropped. I even added a comment about that :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27 Jul 2006, at 14:12, Herbert Xu wrote:> Had we used the body of the second loop for the initial segment we > would > end up wiping out the values of rx_skbs and grant_rx_ref when we do the > assignments > > skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i); > ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); > > because for the initial segment requeue_idx is always the same as i.But wouldn''t the xennet_get_* functions wipe the entries, only to be immediately filled in again by the assignment to e.g., np->rx_skbs[requeue_idx], so it would all work out okay (except a needless wipe-then-rewrite)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 27, 2006 at 02:28:48PM +0100, Keir Fraser wrote:> > >Had we used the body of the second loop for the initial segment we > >would > >end up wiping out the values of rx_skbs and grant_rx_ref when we do the > >assignments > > > > skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i); > > ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, > > i); > > > >because for the initial segment requeue_idx is always the same as i. > > But wouldn''t the xennet_get_* functions wipe the entries, only to be > immediately filled in again by the assignment to e.g., > np->rx_skbs[requeue_idx], so it would all work out okay (except a > needless wipe-then-rewrite)?That was the reason of having two loops instead of the one. In the first loop, we''re not moving entries so we don''t assign rx_skbs/grant_rx_ref at all. In the second loop, we know that requeue_idx != i so we''re always moving entries forward in which case clearing the original position is necessary. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27 Jul 2006, at 14:23, Herbert Xu wrote:>> Also, what does NETIF_F_LLTX have to do with whether we have a queue >> or >> not -- couldn''t we set that all the time? And you set queue length to >> 1 > > Sorry, should''ve add a comment about that. NETIF_F_LLTX is just an > unused > bit from the features set to indicate the fact that queueing is > supported. > I suppose I could add a new flag too.That''s pretty gross. NETIF_F_LLTX is used isn''t it? To indicate whether an interface does its own locking in start_xmit()? Why not have a private feature flag in an appropriate netback-private structure?>> in the other case -- what''s wrong with zero? Seems a saner value when >> there is no queue: or do we need to commit to having *some* queue at >> register_netdevice() time, and there''s no way to go back from that by >> the time we find out if netfront supports rx refill notifications (so >> at that point we cannot change queue len to zero)? > > Exactly. Once we''ve committed to having a queue, setting the queue > length to zero will cause all packets to be dropped. I even added > a comment about that :)Ok, sounds reasonable then. But I would change queue length to 100 in your next spin of this patch. 1000 is scary big. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27 Jul 2006, at 14:30, Herbert Xu wrote:>> But wouldn''t the xennet_get_* functions wipe the entries, only to be >> immediately filled in again by the assignment to e.g., >> np->rx_skbs[requeue_idx], so it would all work out okay (except a >> needless wipe-then-rewrite)? > > That was the reason of having two loops instead of the one. In the > first > loop, we''re not moving entries so we don''t assign rx_skbs/grant_rx_ref > at all. In the second loop, we know that requeue_idx != i so we''re > always moving entries forward in which case clearing the original > position > is necessary.It''s not a time critical piece of code. I''d much rather have a single loop that wastes some small amount of time. That routine is confusing enough as it is. :-) Also, what about putting the array-initialising loop back: reasonable? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jul 27, 2006 at 02:34:41PM +0100, Keir Fraser wrote:> > It''s not a time critical piece of code. I''d much rather have a single > loop that wastes some small amount of time. That routine is confusing > enough as it is. :-)Actually I split it into two loops for clarity :) I suppose now that I''ve merged it back into one it doesn''t look to bad.> Also, what about putting the array-initialising loop back: reasonable?Consider it done. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Herbert Xu
2006-Jul-28 08:06 UTC
[Xen-devel] Re: [5/11] [NET] back: Replace netif->active with netif_carrier_ok
On Thu, Jul 27, 2006 at 02:11:01PM +0100, Keir Fraser wrote:> > netif->active is also conditional on whether or not the interface is > open (''running''). So it''s not as good a fit for netif_carrier as the > status field you similarly replaced in netfront. > > Perhaps you could replace the netback status field > (CONNECTED/DISCONNECTED) instead? I think the extra value DISCONNECTING > is not really necessary any more, so it should map onto the boolean > netif_carrier okay. > > Then ->active could also potentially be removed. It would correspond to > netif_running&&netif_carrier_ok.Good point. I''ve made the carrier correspond to CONNECTED/DISCONNECTED and added the netif_running tests. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel