thr3ads.net - Xen devel - [Xen-devel] [0/11] [NET]: Add TSO support for dom0=>domU [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Herbert Xu

2006-Jul-07 14:16 UTC

[Xen-devel] [0/11] [NET]: Add TSO support for dom0=>domU

Hi Keir:

This series of patches adds SG and TSO support for the dom0=>domU
direction.  This completes the domU=>domU loop which means that we
have TSO all the way through.

It''s a bit late here so I haven''t done a full set of
benchmarks.
However, these two figures should be indicative for domU=>domU:

TSO off:    323.91Mb/s
TSO on:    1678.52Mb/s
lo(16436): 5533.93Mb/s

Of course you''ll need to turn the 2 #if''s back on for
domU=>dom0 to
do this.  I''ve also found out the reason I was getting smaller MSSes
before.  The default write buffer size is too small for these speeds
in Linux.  So you need to raise it by something like:

echo 526808 > /proc/sys/net/core/wmem_max
netperf -t TCP_STREAM -H x.x.x.x -- -s 263404

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:17 UTC

head link

[Xen-devel] [1/11] [NET] front: Zero gso features/pad

Hi:

[NET] front: Zero gso features/pad

Set unused gso features/pad values to zero for forward-compatibility.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 8e55c5c11475 -r dd443604e775
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Wed Jul 05 18:48:41
2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:35:42
2006 +1000
@@ -788,6 +788,8 @@ static int network_start_xmit(struct sk_
 
 		gso->u.gso.size = skb_shinfo(skb)->gso_size;
 		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
+		gso->u.gso.pad = 0;
+		gso->u.gso.features = 0;
 
 		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
 		gso->flags = 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:17 UTC

head link

[Xen-devel] [2/11] [NET] back: Make use of the simplicity of tasklets in net_rx_action

Hi:

[NET] back: Make use of the simplicity of tasklets in net_rx_action

Tasklets have the property that each one is running on only one CPU at
any time.  This means that you don''t have to worry about the tasklet
racing against itself.  Therefore any resources used by just a single
tasklet does not need to be guarded by locks.

Since net_rx_action is the only user of alloc_mfn, we can remove the
mfn_lock that guard it.

The notify_list array is huge by Linux standards so placing it on the
stack is unsafe.  Since net_rx_action is not re-entrant, we can simply
make it static.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r dd443604e775 -r e90f22eee9a1
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:35:42
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:35:46
2006 +1000
@@ -99,24 +99,21 @@ static spinlock_t net_schedule_list_lock
 #define MAX_MFN_ALLOC 64
 static unsigned long mfn_list[MAX_MFN_ALLOC];
 static unsigned int alloc_index = 0;
-static DEFINE_SPINLOCK(mfn_lock);
 
 static unsigned long alloc_mfn(void)
 {
-	unsigned long mfn = 0, flags;
+	unsigned long mfn = 0;
 	struct xen_memory_reservation reservation = {
 		.nr_extents   = MAX_MFN_ALLOC,
 		.extent_order = 0,
 		.domid        = DOMID_SELF
 	};
 	set_xen_guest_handle(reservation.extent_start, mfn_list);
-	spin_lock_irqsave(&mfn_lock, flags);
 	if ( unlikely(alloc_index == 0) )
 		alloc_index = HYPERVISOR_memory_op(
 			XENMEM_increase_reservation, &reservation);
 	if ( alloc_index != 0 )
 		mfn = mfn_list[--alloc_index];
-	spin_unlock_irqrestore(&mfn_lock, flags);
 	return mfn;
 }
 
@@ -222,9 +219,13 @@ static void net_rx_action(unsigned long 
 	unsigned long vdata, old_mfn, new_mfn;
 	struct sk_buff_head rxq;
 	struct sk_buff *skb;
-	u16 notify_list[NET_RX_RING_SIZE];
 	int notify_nr = 0;
 	int ret;
+	/*
+	 * Putting hundreds of bytes on the stack is considered rude.
+	 * Static works because a tasklet can only be on one CPU at any time.
+	 */
+	static u16 notify_list[NET_RX_RING_SIZE];
 
 	skb_queue_head_init(&rxq);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:18 UTC

head link

[Xen-devel] [3/11] [NET] front: Stop using rx->id

Hi:

[NET] front: Stop using rx->id

With the current protocol for transferring packets from dom0 to domU, the
rx->id field is useless because it can be derived from the rx request ring
ID.  In particular,

	rx->id = (ring_id & NET_RX_RING_SIZE - 1) + 1;

This formula works because the rx response to each request always occupies
the same slot that the request arrived in.  This in turn is a consequence
of the fact that each packet only occupies one slot.

The other important reason that this works for dom0=>domU but not
domU=>dom0
is that the resource associated with the rx->id is freed immediately while
in the domU=>dom0 case the resource is held until the skb is liberated by
dom0.

Using this formula we can essentially remove rx->id from the protocol,
freeing up space that could be instead be used by things like TSO.  The
only constraint is that the backend must obey the rule that each id must
only be used in the response that occupies the same slot as the request.

The actual field of rx->id is still maintained for compatibility with
older backends.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r e90f22eee9a1 -r 2ec3e15164f7
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:35:46
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:35:52
2006 +1000
@@ -99,17 +99,17 @@ struct netfront_info {
 	struct timer_list rx_refill_timer;
 
 	/*
-	 * {tx,rx}_skbs store outstanding skbuffs. The first entry in each
-	 * array is an index into a chain of free entries.
+	 * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
+	 * is an index into a chain of free entries.
 	 */
 	struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];
-	struct sk_buff *rx_skbs[NET_RX_RING_SIZE+1];
+	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
 
 #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
 	grant_ref_t gref_tx_head;
 	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
 	grant_ref_t gref_rx_head;
-	grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1];
+	grant_ref_t grant_rx_ref[NET_TX_RING_SIZE];
 
 	struct xenbus_device *xbdev;
 	int tx_ring_ref;
@@ -122,7 +122,7 @@ struct netfront_info {
 };
 
 /*
- * Access macros for acquiring freeing slots in {tx,rx}_skbs[].
+ * Access macros for acquiring freeing slots in tx_skbs[].
  */
 
 static inline void add_id_to_freelist(struct sk_buff **list, unsigned short id)
@@ -136,6 +136,29 @@ static inline unsigned short get_id_from
 	unsigned int id = (unsigned int)(unsigned long)list[0];
 	list[0] = list[id];
 	return id;
+}
+
+static inline int xennet_rxidx(RING_IDX idx)
+{
+	return idx & (NET_RX_RING_SIZE - 1);
+}
+
+static inline struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
+						RING_IDX ri)
+{
+	int i = xennet_rxidx(ri);
+	struct sk_buff *skb = np->rx_skbs[i];
+	np->rx_skbs[i] = NULL;
+	return skb;
+}
+
+static inline grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
+					    RING_IDX ri)
+{
+	int i = xennet_rxidx(ri);
+	grant_ref_t ref = np->grant_rx_ref[i];
+	np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	return ref;
 }
 
 #define DPRINTK(fmt, args...)				\
@@ -598,8 +621,9 @@ static void network_alloc_rx_buffers(str
 
 		skb->dev = dev;
 
-		id = get_id_from_freelist(np->rx_skbs);
-
+		id = xennet_rxidx(req_prod + i);
+
+		BUG_ON(np->rx_skbs[id]);
 		np->rx_skbs[id] = skb;
 
 		RING_GET_REQUEST(&np->rx, req_prod + i)->id = id;
@@ -840,6 +864,19 @@ static irqreturn_t netif_int(int irq, vo
 	return IRQ_HANDLED;
 }
 
+static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
+				grant_ref_t ref)
+{
+	int new = xennet_rxidx(np->rx.req_prod_pvt);
+
+	BUG_ON(np->rx_skbs[new]);
+	np->rx_skbs[new] = skb;
+	np->grant_rx_ref[new] = ref;
+	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
+	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
+	np->rx.req_prod_pvt++;
+	RING_PUSH_REQUESTS(&np->rx);
+}
 
 static int netif_poll(struct net_device *dev, int *pbudget)
 {
@@ -874,12 +911,15 @@ static int netif_poll(struct net_device 
 	     i++, work_done++) {
 		rx = RING_GET_RESPONSE(&np->rx, i);
 
+		skb = xennet_get_rx_skb(np, i);
+		ref = xennet_get_rx_ref(np, i);
+
 		/*
 		 * This definitely indicates a bug, either in this driver or in
 		 * the backend driver. In future this should flag the bad
 		 * situation to the system controller to reboot the backed.
 		 */
-		if ((ref = np->grant_rx_ref[rx->id]) == GRANT_INVALID_REF) {
+		if (ref == GRANT_INVALID_REF) {
 			WPRINTK("Bad rx response id %d.\n", rx->id);
 			work_done--;
 			continue;
@@ -890,21 +930,12 @@ static int netif_poll(struct net_device 
 			if (net_ratelimit())
 				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
 					rx->id, rx->status);
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id -			
rx->id;
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref -				ref;
-			np->rx.req_prod_pvt++;
-			RING_PUSH_REQUESTS(&np->rx);
+			xennet_move_rx_slot(np, skb, ref);
 			work_done--;
 			continue;
 		}
 
 		gnttab_release_grant_reference(&np->gref_rx_head, ref);
-		np->grant_rx_ref[rx->id] = GRANT_INVALID_REF;
-
-		skb = np->rx_skbs[rx->id];
-		add_id_to_freelist(np->rx_skbs, rx->id);
 
 		/* NB. We handle skb overflow later. */
 		skb->data = skb->head + rx->offset;
@@ -1128,6 +1159,7 @@ static void network_connect(struct net_d
 	struct netfront_info *np = netdev_priv(dev);
 	int i, requeue_idx;
 	struct sk_buff *skb;
+	grant_ref_t ref;
 
 	xennet_set_features(dev);
 
@@ -1158,15 +1190,25 @@ static void network_connect(struct net_d
 	}
 
 	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
-	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
-		if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
-			continue;
+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+		if (!np->rx_skbs[i])
+			break;
 		gnttab_grant_foreign_transfer_ref(
 			np->grant_rx_ref[i], np->xbdev->otherend_id,
 			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
-		RING_GET_REQUEST(&np->rx, requeue_idx)->gref -		
np->grant_rx_ref[i];
-		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
+		RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i];
+		RING_GET_REQUEST(&np->rx, i)->id = i;
+	}
+	for (requeue_idx = i++; i < NET_RX_RING_SIZE; i++) {
+		if (!np->rx_skbs[i])
+			continue;
+		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
+		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
+		gnttab_grant_foreign_transfer_ref(
+			ref, np->xbdev->otherend_id,
+			__pa(skb->data) >> PAGE_SHIFT);
+		RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref;
+		RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx;
 		requeue_idx++;
 	}
 
@@ -1391,11 +1433,6 @@ static struct net_device * __devinit cre
 		np->grant_tx_ref[i] = GRANT_INVALID_REF;
 	}
 
-	for (i = 0; i <= NET_RX_RING_SIZE; i++) {
-		np->rx_skbs[i] = (void *)((unsigned long) i+1);
-		np->grant_rx_ref[i] = GRANT_INVALID_REF;
-	}
-
 	/* A grant for every tx ring slot */
 	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
 					  &np->gref_tx_head) < 0) {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:18 UTC

head link

[Xen-devel] [4/11] [NET] front: Move rx req pushing to one spot

Hi:

[NET] front: Move rx req pushing to one spot

This patch moves all rx request pushing to network_alloc_rx_buffers.
This is needed to reduce churn for TSO.  More importantly, this makes
it easier to send notifications when adding rx requests which is
required for having a queue in dom0.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 2ec3e15164f7 -r d398fb29e84a
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:35:52
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:36:12
2006 +1000
@@ -600,14 +600,17 @@ static void network_alloc_rx_buffers(str
 			/* Could not allocate any skbuffs. Try again later. */
 			mod_timer(&np->rx_refill_timer,
 				  jiffies + (HZ/10));
-			return;
+			break;
 		}
 		__skb_queue_tail(&np->rx_batch, skb);
 	}
 
 	/* Is the batch large enough to be worthwhile? */
-	if (i < (np->rx_target/2))
+	if (i < (np->rx_target/2)) {
+		if (req_prod > np->rx.sring->req_prod)
+			goto push;
 		return;
+	}
 
 	/* Adjust our fill target if we risked running out of buffers. */
 	if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4))
&&
@@ -678,6 +681,7 @@ static void network_alloc_rx_buffers(str
 
 	/* Above is a suitable barrier to ensure backend will see requests. */
 	np->rx.req_prod_pvt = req_prod + i;
+push:
 	RING_PUSH_REQUESTS(&np->rx);
 }
 
@@ -875,7 +879,6 @@ static void xennet_move_rx_slot(struct n
 	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
 	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
 	np->rx.req_prod_pvt++;
-	RING_PUSH_REQUESTS(&np->rx);
 }
 
 static int netif_poll(struct net_device *dev, int *pbudget)
@@ -1213,7 +1216,6 @@ static void network_connect(struct net_d
 	}
 
 	np->rx.req_prod_pvt = requeue_idx;
-	RING_PUSH_REQUESTS(&np->rx);
 
 	/*
 	 * Step 3: All public and private state should now be sane.  Get

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:19 UTC

head link

[Xen-devel] [5/11] [NET] back: Replace netif->active with netif_carrier_ok

Hi:

[NET] back: Replace netif->active with netif_carrier_ok

The connection status to the frontend can be represented using
netif_carrier_ok instead of netif->active.  As a result, we delay
the construction of the dev qdisc until the carrier comes on.  This
is a prerequisite for adding a tx queue.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r d398fb29e84a -r 9fa8689fc0e9
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 07 23:36:12 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 07 23:36:16 2006
+1000
@@ -87,7 +87,7 @@ typedef struct netif_st {
 
 	/* Miscellaneous private stuff. */
 	enum { DISCONNECTED, DISCONNECTING, CONNECTED } status;
-	int active;
+
 	struct list_head list;  /* scheduling list */
 	atomic_t         refcnt;
 	struct net_device *dev;
diff -r d398fb29e84a -r 9fa8689fc0e9
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:36:12
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:36:16
2006 +1000
@@ -37,9 +37,7 @@ static void __netif_up(netif_t *netif)
 static void __netif_up(netif_t *netif)
 {
 	struct net_device *dev = netif->dev;
-	netif_tx_lock_bh(dev);
-	netif->active = 1;
-	netif_tx_unlock_bh(dev);
+	netif_carrier_on(dev);
 	enable_irq(netif->irq);
 	netif_schedule_work(netif);
 }
@@ -49,7 +47,7 @@ static void __netif_down(netif_t *netif)
 	struct net_device *dev = netif->dev;
 	disable_irq(netif->irq);
 	netif_tx_lock_bh(dev);
-	netif->active = 0;
+	netif_carrier_off(dev);
 	netif_tx_unlock_bh(dev);
 	netif_deschedule_work(netif);
 }
@@ -91,6 +89,8 @@ netif_t *netif_alloc(domid_t domid, unsi
 		DPRINTK("Could not create netif: out of memory\n");
 		return ERR_PTR(-ENOMEM);
 	}
+
+	netif_carrier_off(dev);
 
 	netif = netdev_priv(dev);
 	memset(netif, 0, sizeof(*netif));
diff -r d398fb29e84a -r 9fa8689fc0e9
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:36:12
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:36:16
2006 +1000
@@ -143,7 +143,7 @@ int netif_be_start_xmit(struct sk_buff *
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if the target domain has no receive buffers. */
-	if (!netif->active || 
+	if (unlikely(!netif_carrier_ok(dev)) ||
 	    (netif->rx_req_cons_peek == netif->rx.sring->req_prod) ||
 	    ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) = 	    
NET_RX_RING_SIZE))
@@ -404,7 +404,8 @@ static void add_to_net_schedule_list_tai
 		return;
 
 	spin_lock_irq(&net_schedule_list_lock);
-	if (!__on_net_schedule_list(netif) && netif->active) {
+	if (!__on_net_schedule_list(netif) &&
+	    likely(netif_carrier_ok(netif->dev))) {
 		list_add_tail(&netif->list, &net_schedule_list);
 		netif_get(netif);
 	}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:19 UTC

head link

[Xen-devel] [6/11] [NET] front: Added feature-rx-notify

Hi:

[NET] front: Added feature-rx-notify

This patch adds support to the frontend for notifying the backend whenever
the rx ring is refilled.  This is required in order for the backend to get
a tx queue.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 9fa8689fc0e9 -r 2ac3f7b0e0fc
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:36:16
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:36:20
2006 +1000
@@ -326,6 +326,12 @@ again:
 		goto abort_transaction;
 	}
 
+	err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify",
"%d", 1);
+	if (err) {
+		message = "writing feature-rx-notify";
+		goto abort_transaction;
+	}
+
 	err = xenbus_transaction_end(xbt, 0);
 	if (err) {
 		if (err == -EAGAIN)
@@ -683,6 +689,7 @@ static void network_alloc_rx_buffers(str
 	np->rx.req_prod_pvt = req_prod + i;
 push:
 	RING_PUSH_REQUESTS(&np->rx);
+	notify_remote_via_irq(np->irq);
 }
 
 static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
@@ -1224,7 +1231,6 @@ static void network_connect(struct net_d
 	 * packets.
 	 */
 	netif_carrier_on(dev);
-	notify_remote_via_irq(np->irq);
 	network_tx_buf_gc(dev);
 	network_alloc_rx_buffers(dev);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:19 UTC

head link

[Xen-devel] [7/11] [NET] back: Added tx queue

Hi:

[NET] back: Added tx queue

This patch adds a tx queue to the backend if the frontend supports rx
refill notification.  A queue is needed because SG/TSO greatly reduces
the number of packets that can be stored in the rx ring.  Given an rx
ring with 256 entries, a maximum TSO packet can occupy as many as 18
entries, meaning that the entire ring can only hold 14 packets.  This
is too small at high bandwidths with large TCP RX windows.

Having a tx queue does not present a new security risk as the queue is
a fixed size buffer just like rx ring.  So each guest can only hold a
fixed amount of memory (proportional to the tx queue length) on the host.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 2ac3f7b0e0fc -r c754083146bc
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 07 23:36:20 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 07 23:36:58 2006
+1000
@@ -76,6 +76,9 @@ typedef struct netif_st {
 	struct vm_struct *tx_comms_area;
 	struct vm_struct *rx_comms_area;
 
+	/* Set of features that can be turned on in dev->features. */
+	int features;
+
 	/* Allow netif_be_start_xmit() to peek ahead in the rx request ring. */
 	RING_IDX rx_req_cons_peek;
 
@@ -121,4 +124,10 @@ struct net_device_stats *netif_be_get_st
 struct net_device_stats *netif_be_get_stats(struct net_device *dev);
 irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs);
 
+static inline int netbk_can_queue(struct net_device *dev)
+{
+	netif_t *netif = netdev_priv(dev);
+	return netif->features & NETIF_F_LLTX;
+}
+
 #endif /* __NETIF__BACKEND__COMMON_H__ */
diff -r 2ac3f7b0e0fc -r c754083146bc
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:36:20
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:36:58
2006 +1000
@@ -57,14 +57,12 @@ static int net_open(struct net_device *d
 	netif_t *netif = netdev_priv(dev);
 	if (netif->status == CONNECTED)
 		__netif_up(netif);
-	netif_start_queue(dev);
 	return 0;
 }
 
 static int net_close(struct net_device *dev)
 {
 	netif_t *netif = netdev_priv(dev);
-	netif_stop_queue(dev);
 	if (netif->status == CONNECTED)
 		__netif_down(netif);
 	return 0;
@@ -112,9 +110,6 @@ netif_t *netif_alloc(domid_t domid, unsi
 	dev->features        = NETIF_F_IP_CSUM;
 
 	SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
-
-	/* Disable queuing. */
-	dev->tx_queue_len = 0;
 
 	for (i = 0; i < ETH_ALEN; i++)
 		if (be_mac[i] != 0)
diff -r 2ac3f7b0e0fc -r c754083146bc
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:36:20
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:36:58
2006 +1000
@@ -136,6 +136,14 @@ static inline int is_xen_skb(struct sk_b
 	return (cp == skbuff_cachep);
 }
 
+static inline int netbk_queue_full(netif_t *netif)
+{
+	RING_IDX peek = netif->rx_req_cons_peek;
+
+	return netif->rx.sring->req_prod - peek <= 0 ||
+	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0;
+}
+
 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	netif_t *netif = netdev_priv(dev);
@@ -143,11 +151,13 @@ int netif_be_start_xmit(struct sk_buff *
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if the target domain has no receive buffers. */
-	if (unlikely(!netif_carrier_ok(dev)) ||
-	    (netif->rx_req_cons_peek == netif->rx.sring->req_prod) ||
-	    ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) =-	    
NET_RX_RING_SIZE))
+	if (unlikely(!netif_carrier_ok(dev)))
 		goto drop;
+
+	if (unlikely(netbk_queue_full(netif))) {
+		BUG_ON(netbk_can_queue(dev));
+		goto drop;
+	}
 
 	/*
 	 * We do not copy the packet unless:
@@ -178,6 +188,9 @@ int netif_be_start_xmit(struct sk_buff *
 	netif->rx_req_cons_peek++;
 	netif_get(netif);
 
+	if (netbk_can_queue(dev) && netbk_queue_full(netif))
+		netif_stop_queue(dev);
+
 	skb_queue_tail(&rx_queue, skb);
 	tasklet_schedule(&net_rx_tasklet);
 
@@ -351,6 +364,10 @@ static void net_rx_action(unsigned long 
 			notify_list[notify_nr++] = irq;
 		}
 
+		if (netif_queue_stopped(netif->dev) &&
+		    !netbk_queue_full(netif))
+			netif_wake_queue(netif->dev);
+
 		netif_put(netif);
 		dev_kfree_skb(skb);
 		gop++;
@@ -973,8 +990,13 @@ irqreturn_t netif_be_int(int irq, void *
 irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs)
 {
 	netif_t *netif = dev_id;
+
 	add_to_net_schedule_list_tail(netif);
 	maybe_schedule_tx_action();
+
+	if (netif_queue_stopped(netif->dev) && !netbk_queue_full(netif))
+		netif_wake_queue(netif->dev);
+
 	return IRQ_HANDLED;
 }
 
diff -r 2ac3f7b0e0fc -r c754083146bc
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 07 23:36:20 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 07 23:36:58 2006
+1000
@@ -353,6 +353,7 @@ static int connect_rings(struct backend_
 	unsigned long tx_ring_ref, rx_ring_ref;
 	unsigned int evtchn;
 	int err;
+	int val;
 
 	DPRINTK("");
 
@@ -366,6 +367,15 @@ static int connect_rings(struct backend_
 				 dev->otherend);
 		return err;
 	}
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-notify",
"%d",
+			 &val) < 0)
+		val = 0;
+	if (val)
+		be->netif->features |= NETIF_F_LLTX;
+	else
+		/* Must be non-zero for pfifo_fast to work. */
+		be->netif->dev->tx_queue_len = 1;
 
 	/* Map the shared frame, irq etc. */
 	err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:20 UTC

head link

[Xen-devel] [8/11] [NET] front: Add SG support

Hi:

[NET] front: Add SG support

This patch adds scatter-and-gather support to the frontend.  It also
advertises this fact through xenbus so that the backend can detect
this and send through SG requests only if it is supported.

SG support is required to support skb''s larger than one page.  This
in turn is needed for either jumbo MTU or TSO.  One of these is
required to bring local networking performance up to a level that
is acceptable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r c754083146bc -r c82398c08669
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:36:58
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:37:45
2006 +1000
@@ -46,11 +46,11 @@
 #include <linux/ethtool.h>
 #include <linux/in.h>
 #include <linux/if_ether.h>
+#include <linux/io.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 #include <net/arp.h>
 #include <net/route.h>
-#include <asm/io.h>
 #include <asm/uaccess.h>
 #include <xen/evtchn.h>
 #include <xen/xenbus.h>
@@ -62,17 +62,12 @@
 #include <xen/interface/grant_table.h>
 #include <xen/gnttab.h>
 
+#define RX_COPY_THRESHOLD 256
+
 #define GRANT_INVALID_REF	0
 
 #define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE)
 #define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE)
-
-static inline void init_skb_shinfo(struct sk_buff *skb)
-{
-	atomic_set(&(skb_shinfo(skb)->dataref), 1);
-	skb_shinfo(skb)->nr_frags = 0;
-	skb_shinfo(skb)->frag_list = NULL;
-}
 
 struct netfront_info {
 	struct list_head list;
@@ -329,6 +324,12 @@ again:
 	err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify",
"%d", 1);
 	if (err) {
 		message = "writing feature-rx-notify";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-sg",
"%d", 1);
+	if (err) {
+		message = "writing feature-sg";
 		goto abort_transaction;
 	}
 
@@ -575,10 +576,13 @@ static void network_alloc_rx_buffers(str
 	unsigned short id;
 	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
+	struct page *page;
 	int i, batch_target;
 	RING_IDX req_prod = np->rx.req_prod_pvt;
 	struct xen_memory_reservation reservation;
 	grant_ref_t ref;
+	unsigned long pfn;
+	void *vaddr;
 
 	if (unlikely(!netif_carrier_ok(dev)))
 		return;
@@ -591,15 +595,16 @@ static void network_alloc_rx_buffers(str
 	 */
 	batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
 	for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
-		/*
-		 * Subtract dev_alloc_skb headroom (16 bytes) and shared info
-		 * tailroom then round down to SKB_DATA_ALIGN boundary.
-		 */
-		skb = __dev_alloc_skb(
-			((PAGE_SIZE - sizeof(struct skb_shared_info)) &
-			 (-SKB_DATA_ALIGN(1))) - 16,
-			GFP_ATOMIC|__GFP_NOWARN);
-		if (skb == NULL) {
+		/* Allocate an skb and a page. */
+		skb = __dev_alloc_skb(RX_COPY_THRESHOLD,
+				      GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(!skb))
+			goto no_skb;
+
+		page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+		if (!page) {
+			kfree_skb(skb);
+no_skb:
 			/* Any skbuffs queued for refill? Force them out. */
 			if (i != 0)
 				goto refill;
@@ -608,6 +613,9 @@ static void network_alloc_rx_buffers(str
 				  jiffies + (HZ/10));
 			break;
 		}
+
+		skb_shinfo(skb)->frags[0].page = page;
+		skb_shinfo(skb)->nr_frags = 1;
 		__skb_queue_tail(&np->rx_batch, skb);
 	}
 
@@ -639,18 +647,20 @@ static void network_alloc_rx_buffers(str
 		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
 		np->grant_rx_ref[id] = ref;
+
+		pfn = page_to_pfn(skb_shinfo(skb)->frags[0].page);
+		vaddr = page_address(skb_shinfo(skb)->frags[0].page);
+
 		gnttab_grant_foreign_transfer_ref(ref,
-						  np->xbdev->otherend_id,
-						  __pa(skb->head)>>PAGE_SHIFT);
+						  np->xbdev->otherend_id, pfn);
 		RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref;
-		np->rx_pfn_array[i] = virt_to_mfn(skb->head);
+		np->rx_pfn_array[i] = pfn_to_mfn(pfn);
 
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			/* Remove this page before passing back to Xen. */
-			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
-					    INVALID_P2M_ENTRY);
+			set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 			MULTI_update_va_mapping(np->rx_mcl+i,
-						(unsigned long)skb->head,
+						(unsigned long)vaddr,
 						__pte(0), 0);
 		}
 	}
@@ -888,41 +898,29 @@ static void xennet_move_rx_slot(struct n
 	np->rx.req_prod_pvt++;
 }
 
-static int netif_poll(struct net_device *dev, int *pbudget)
-{
-	struct netfront_info *np = netdev_priv(dev);
-	struct sk_buff *skb, *nskb;
-	struct netif_rx_response *rx;
-	RING_IDX i, rp;
-	struct mmu_update *mmu = np->rx_mmu;
-	struct multicall_entry *mcl = np->rx_mcl;
-	int work_done, budget, more_to_do = 1;
-	struct sk_buff_head rxq;
-	unsigned long flags;
-	unsigned long mfn;
-	grant_ref_t ref;
-
-	spin_lock(&np->rx_lock);
-
-	if (unlikely(!netif_carrier_ok(dev))) {
-		spin_unlock(&np->rx_lock);
-		return 0;
-	}
-
-	skb_queue_head_init(&rxq);
-
-	if ((budget = *pbudget) > dev->quota)
-		budget = dev->quota;
-	rp = np->rx.sring->rsp_prod;
-	rmb(); /* Ensure we see queued responses up to ''rp''. */
-
-	for (i = np->rx.rsp_cons, work_done = 0;
-	     (i != rp) && (work_done < budget);
-	     i++, work_done++) {
-		rx = RING_GET_RESPONSE(&np->rx, i);
-
-		skb = xennet_get_rx_skb(np, i);
-		ref = xennet_get_rx_ref(np, i);
+static int xennet_get_responses(struct netfront_info *np,
+				struct netif_rx_response *rx, RING_IDX rp,
+				struct sk_buff_head *list, int count)
+{
+	struct mmu_update *mmu = np->rx_mmu + count;
+	struct multicall_entry *mcl = np->rx_mcl + count;
+	RING_IDX cons = np->rx.rsp_cons;
+	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
+	grant_ref_t ref = xennet_get_rx_ref(np, cons);
+	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
+	int frags = 1;
+	int err = 0;
+
+	for (;;) {
+		unsigned long mfn;
+
+		if (unlikely(rx->status < 0 ||
+			     rx->offset + rx->status > PAGE_SIZE)) {
+			if (net_ratelimit())
+				WPRINTK("rx->offset: %x, size: %u\n",
+					rx->offset, rx->status);
+			err = -EINVAL;
+		}
 
 		/*
 		 * This definitely indicates a bug, either in this driver or in
@@ -931,8 +929,8 @@ static int netif_poll(struct net_device 
 		 */
 		if (ref == GRANT_INVALID_REF) {
 			WPRINTK("Bad rx response id %d.\n", rx->id);
-			work_done--;
-			continue;
+			err = -EINVAL;
+			goto next;
 		}
 
 		/* Memory pressure, insufficient buffer headroom, ... */
@@ -941,16 +939,161 @@ static int netif_poll(struct net_device 
 				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
 					rx->id, rx->status);
 			xennet_move_rx_slot(np, skb, ref);
+			err = -ENOMEM;
+			goto next;
+		}
+
+		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* Remap the page. */
+			struct page *page = skb_shinfo(skb)->frags[0].page;
+			unsigned long pfn = page_to_pfn(page);
+			void *vaddr = page_address(page);
+
+			MULTI_update_va_mapping(mcl, (unsigned long)vaddr,
+						pfn_pte_ma(mfn, PAGE_KERNEL),
+						0);
+			mcl++;
+			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
+				| MMU_MACHPHYS_UPDATE;
+			mmu->val = pfn;
+			mmu++;
+
+			set_phys_to_machine(pfn, mfn);
+		}
+
+		__skb_queue_tail(list, skb);
+
+next:
+		if (!(rx->flags & NETRXF_more_data))
+			break;
+
+		if (cons + frags == rp) {
+			if (net_ratelimit())
+				WPRINTK("Need more frags\n");
+			err = -ENOENT;
+			break;
+		}
+
+		rx = RING_GET_RESPONSE(&np->rx, cons + frags);
+		skb = xennet_get_rx_skb(np, cons + frags);
+		ref = xennet_get_rx_ref(np, cons + frags);
+		frags++;
+	}
+
+	if (unlikely(frags > max)) {
+		if (net_ratelimit())
+			WPRINTK("Too many frags\n");
+		err = -E2BIG;
+	}
+
+	return err;
+}
+
+static RING_IDX xennet_fill_frags(struct netfront_info *np,
+				  struct sk_buff *skb,
+				  struct sk_buff_head *list)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int nr_frags = shinfo->nr_frags;
+	RING_IDX cons = np->rx.rsp_cons;
+	skb_frag_t *frag = shinfo->frags + nr_frags;
+	struct sk_buff *nskb;
+
+	while ((nskb = __skb_dequeue(list))) {
+		struct netif_rx_response *rx +			RING_GET_RESPONSE(&np->rx, ++cons);
+
+		frag->page = skb_shinfo(nskb)->frags[0].page;
+		frag->page_offset = rx->offset;
+		frag->size = rx->status;
+
+		skb->data_len += rx->status;
+
+		skb_shinfo(nskb)->nr_frags = 0;
+		kfree_skb(nskb);
+
+		frag++;
+		nr_frags++;
+	}
+
+	shinfo->nr_frags = nr_frags;
+	return cons;
+}
+
+static int netif_poll(struct net_device *dev, int *pbudget)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	struct sk_buff *skb;
+	struct netif_rx_response *rx;
+	RING_IDX i, rp;
+	struct multicall_entry *mcl;
+	int work_done, budget, more_to_do = 1;
+	struct sk_buff_head rxq;
+	struct sk_buff_head errq;
+	struct sk_buff_head tmpq;
+	unsigned long flags;
+	unsigned int len;
+	int pages_done;
+	int err;
+
+	spin_lock(&np->rx_lock);
+
+	if (unlikely(!netif_carrier_ok(dev))) {
+		spin_unlock(&np->rx_lock);
+		return 0;
+	}
+
+	skb_queue_head_init(&rxq);
+	skb_queue_head_init(&errq);
+	skb_queue_head_init(&tmpq);
+
+	if ((budget = *pbudget) > dev->quota)
+		budget = dev->quota;
+	rp = np->rx.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to ''rp''. */
+
+	for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0;
+	     (i != rp) && (work_done < budget);
+	     np->rx.rsp_cons = ++i, work_done++) {
+		rx = RING_GET_RESPONSE(&np->rx, i);
+
+		err = xennet_get_responses(np, rx, rp, &tmpq, pages_done);
+		pages_done += skb_queue_len(&tmpq);
+
+		if (unlikely(err)) {
+			i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1;
 			work_done--;
+			while ((skb = __skb_dequeue(&tmpq)))
+				__skb_queue_tail(&errq, skb);
+			np->stats.rx_errors++;
 			continue;
 		}
 
-		gnttab_release_grant_reference(&np->gref_rx_head, ref);
-
-		/* NB. We handle skb overflow later. */
-		skb->data = skb->head + rx->offset;
-		skb->len  = rx->status;
-		skb->tail = skb->data + skb->len;
+		skb = __skb_dequeue(&tmpq);
+
+		skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page;
+		skb->h.raw = skb->nh.raw + rx->offset;
+
+		len = rx->status;
+		if (len > RX_COPY_THRESHOLD)
+			len = RX_COPY_THRESHOLD;
+		skb_put(skb, len);
+
+		if (rx->status > len) {
+			skb_shinfo(skb)->frags[0].page_offset +				rx->offset + len;
+			skb_shinfo(skb)->frags[0].size = rx->status - len;
+			skb->data_len = rx->status - len;
+		} else {
+			skb_shinfo(skb)->frags[0].page = NULL;
+			skb_shinfo(skb)->nr_frags = 0;
+		}
+
+		i = xennet_fill_frags(np, skb, &tmpq);
+		skb->truesize += skb->data_len;
+		skb->len += skb->data_len;
 
 		/*
 		 * Old backends do not assert data_validated but we
@@ -966,96 +1109,38 @@ static int netif_poll(struct net_device 
 		skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank);
 
 		np->stats.rx_packets++;
-		np->stats.rx_bytes += rx->status;
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Remap the page. */
-			MULTI_update_va_mapping(mcl, (unsigned long)skb->head,
-						pfn_pte_ma(mfn, PAGE_KERNEL),
-						0);
-			mcl++;
-			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
-				| MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(skb->head) >> PAGE_SHIFT;
-			mmu++;
-
-			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
-					    mfn);
-		}
+		np->stats.rx_bytes += skb->len;
 
 		__skb_queue_tail(&rxq, skb);
 	}
 
 	/* Some pages are no longer absent... */
-	balloon_update_driver_allowance(-work_done);
+	balloon_update_driver_allowance(-pages_done);
 
 	/* Do all the remapping work, and M2P updates, in one big hypercall. */
-	if (likely((mcl - np->rx_mcl) != 0)) {
+	if (likely(pages_done)) {
+		mcl = np->rx_mcl + pages_done;
 		mcl->op = __HYPERVISOR_mmu_update;
 		mcl->args[0] = (unsigned long)np->rx_mmu;
-		mcl->args[1] = mmu - np->rx_mmu;
+		mcl->args[1] = pages_done;
 		mcl->args[2] = 0;
 		mcl->args[3] = DOMID_SELF;
-		mcl++;
-		(void)HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl);
-	}
+		(void)HYPERVISOR_multicall(np->rx_mcl, pages_done + 1);
+	}
+
+	while ((skb = __skb_dequeue(&errq)))
+		kfree_skb(skb);
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-		if (skb->len > (dev->mtu + ETH_HLEN + 4)) {
-			if (net_ratelimit())
-				printk(KERN_INFO "Received packet too big for "
-				       "MTU (%d > %d)\n",
-				       skb->len - ETH_HLEN - 4, dev->mtu);
-			skb->len  = 0;
-			skb->tail = skb->data;
-			init_skb_shinfo(skb);
-			dev_kfree_skb(skb);
-			continue;
-		}
-
-		/*
-		 * Enough room in skbuff for the data we were passed? Also,
-		 * Linux expects at least 16 bytes headroom in each rx buffer.
-		 */
-		if (unlikely(skb->tail > skb->end) ||
-		    unlikely((skb->data - skb->head) < 16)) {
-			if (net_ratelimit()) {
-				if (skb->tail > skb->end)
-					printk(KERN_INFO "Received packet "
-					       "is %zd bytes beyond tail.\n",
-					       skb->tail - skb->end);
-				else
-					printk(KERN_INFO "Received packet "
-					       "is %zd bytes before head.\n",
-					       16 - (skb->data - skb->head));
-			}
-
-			nskb = __dev_alloc_skb(skb->len + 2,
-					       GFP_ATOMIC|__GFP_NOWARN);
-			if (nskb != NULL) {
-				skb_reserve(nskb, 2);
-				skb_put(nskb, skb->len);
-				memcpy(nskb->data, skb->data, skb->len);
-				/* Copy any other fields we already set up. */
-				nskb->dev = skb->dev;
-				nskb->ip_summed = skb->ip_summed;
-				nskb->proto_data_valid = skb->proto_data_valid;
-				nskb->proto_csum_blank = skb->proto_csum_blank;
-			}
-
-			/* Reinitialise and then destroy the old skbuff. */
-			skb->len  = 0;
-			skb->tail = skb->data;
-			init_skb_shinfo(skb);
-			dev_kfree_skb(skb);
-
-			/* Switch old for new, if we copied the buffer. */
-			if ((skb = nskb) == NULL)
-				continue;
-		}
-
-		/* Set the shinfo area, which is hidden behind the data. */
-		init_skb_shinfo(skb);
+		struct page *page = (struct page *)skb->nh.raw;
+		void *vaddr = page_address(page);
+
+		memcpy(skb->data, vaddr + (skb->h.raw - skb->nh.raw),
+		       skb_headlen(skb));
+
+		if (page != skb_shinfo(skb)->frags[0].page)
+			__free_page(page);
+
 		/* Ethernet work: Delayed to here as it peeks the header. */
 		skb->protocol = eth_type_trans(skb, dev);
 
@@ -1063,8 +1148,6 @@ static int netif_poll(struct net_device 
 		netif_receive_skb(skb);
 		dev->last_rx = jiffies;
 	}
-
-	np->rx.rsp_cons = i;
 
 	/* If we get a callback with very few responses, reduce fill target. */
 	/* NB. Note exponential increase, linear decrease. */
@@ -1205,7 +1288,7 @@ static void network_connect(struct net_d
 			break;
 		gnttab_grant_foreign_transfer_ref(
 			np->grant_rx_ref[i], np->xbdev->otherend_id,
-			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
+			page_to_pfn(skb_shinfo(np->rx_skbs[i])->frags->page));
 		RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i];
 		RING_GET_REQUEST(&np->rx, i)->id = i;
 	}
@@ -1216,7 +1299,7 @@ static void network_connect(struct net_d
 		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
 		gnttab_grant_foreign_transfer_ref(
 			ref, np->xbdev->otherend_id,
-			__pa(skb->data) >> PAGE_SHIFT);
+			page_to_pfn(skb_shinfo(skb)->frags->page));
 		RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref;
 		RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx;
 		requeue_idx++;
diff -r c754083146bc -r c82398c08669 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Fri Jul 07 23:36:58 2006 +1000
+++ b/xen/include/public/io/netif.h	Fri Jul 07 23:37:45 2006 +1000
@@ -123,6 +123,10 @@ typedef struct netif_rx_request netif_rx
 #define _NETRXF_csum_blank     (1)
 #define  NETRXF_csum_blank     (1U<<_NETRXF_csum_blank)
 
+/* Packet continues in the next request descriptor. */
+#define _NETRXF_more_data      (2)
+#define  NETRXF_more_data      (1U<<_NETRXF_more_data)
+
 struct netif_rx_response {
     uint16_t id;
     uint16_t offset;       /* Offset in page of start of received packet  */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:20 UTC

head link

[Xen-devel] [9/11] [NET] back: Transmit SG packets if supported

Hi:

[NET] back: Transmit SG packets if supported

This patch adds scatter-and-gather transmission support to the backend.
This allows the MTU to be raised right now and the potential for TSO in
future.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r c82398c08669 -r 54f84d198a1d
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 07 23:37:45 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 07 23:38:49 2006
+1000
@@ -130,4 +130,10 @@ static inline int netbk_can_queue(struct
 	return netif->features & NETIF_F_LLTX;
 }
 
+static inline int netbk_can_sg(struct net_device *dev)
+{
+	netif_t *netif = netdev_priv(dev);
+	return netif->features & NETIF_F_SG;
+}
+
 #endif /* __NETIF__BACKEND__COMMON_H__ */
diff -r c82398c08669 -r 54f84d198a1d
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:37:45
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:38:49
2006 +1000
@@ -68,10 +68,34 @@ static int net_close(struct net_device *
 	return 0;
 }
 
+static int netbk_change_mtu(struct net_device *dev, int mtu)
+{
+	int max = netbk_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
+
+	if (mtu > max)
+		return -EINVAL;
+	dev->mtu = mtu;
+	return 0;
+}
+
+static int netbk_set_sg(struct net_device *dev, u32 data)
+{
+	if (data) {
+		netif_t *netif = netdev_priv(dev);
+
+		if (!(netif->features & NETIF_F_SG))
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_sg(dev, data);
+}
+
 static struct ethtool_ops network_ethtool_ops  {
 	.get_tx_csum = ethtool_op_get_tx_csum,
 	.set_tx_csum = ethtool_op_set_tx_csum,
+	.get_sg = ethtool_op_get_sg,
+	.set_sg = netbk_set_sg,
 };
 
 netif_t *netif_alloc(domid_t domid, unsigned int handle, u8 be_mac[ETH_ALEN])
@@ -107,6 +131,7 @@ netif_t *netif_alloc(domid_t domid, unsi
 	dev->get_stats       = netif_be_get_stats;
 	dev->open            = net_open;
 	dev->stop            = net_close;
+	dev->change_mtu	     = netbk_change_mtu;
 	dev->features        = NETIF_F_IP_CSUM;
 
 	SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
diff -r c82398c08669 -r 54f84d198a1d
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:37:45
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:38:49
2006 +1000
@@ -40,6 +40,11 @@
 
 /*#define NETBE_DEBUG_INTERRUPT*/
 
+struct netbk_rx_meta {
+	skb_frag_t frag;
+	int id;
+};
+
 static void netif_idx_release(u16 pending_idx);
 static void netif_page_release(struct page *page);
 static void make_tx_response(netif_t *netif, 
@@ -100,21 +105,27 @@ static unsigned long mfn_list[MAX_MFN_AL
 static unsigned long mfn_list[MAX_MFN_ALLOC];
 static unsigned int alloc_index = 0;
 
-static unsigned long alloc_mfn(void)
-{
-	unsigned long mfn = 0;
+static inline unsigned long alloc_mfn(void)
+{
+	return mfn_list[--alloc_index];
+}
+
+static int check_mfn(int nr)
+{
 	struct xen_memory_reservation reservation = {
-		.nr_extents   = MAX_MFN_ALLOC,
 		.extent_order = 0,
 		.domid        = DOMID_SELF
 	};
-	set_xen_guest_handle(reservation.extent_start, mfn_list);
-	if ( unlikely(alloc_index == 0) )
-		alloc_index = HYPERVISOR_memory_op(
-			XENMEM_increase_reservation, &reservation);
-	if ( alloc_index != 0 )
-		mfn = mfn_list[--alloc_index];
-	return mfn;
+
+	if (likely(alloc_index >= nr))
+		return 0;
+
+	set_xen_guest_handle(reservation.extent_start, mfn_list + alloc_index);
+	reservation.nr_extents = MAX_MFN_ALLOC - alloc_index;
+	alloc_index += HYPERVISOR_memory_op(XENMEM_increase_reservation,
+					    &reservation);
+
+	return alloc_index >= nr ? 0 : -ENOMEM;
 }
 
 static inline void maybe_schedule_tx_action(void)
@@ -136,12 +147,87 @@ static inline int is_xen_skb(struct sk_b
 	return (cp == skbuff_cachep);
 }
 
+static struct sk_buff *netbk_copy_skb(struct sk_buff *skb)
+{
+	struct skb_shared_info *ninfo;
+	struct sk_buff *nskb;
+	unsigned long offset;
+	int ret;
+	int len;
+	int headlen;
+
+	nskb = alloc_skb(SKB_MAX_HEAD(0), GFP_ATOMIC);
+	if (unlikely(!nskb))
+		goto err;
+
+	skb_reserve(nskb, 16);
+	headlen = nskb->end - nskb->data;
+	if (headlen > skb_headlen(skb))
+		headlen = skb_headlen(skb);
+	ret = skb_copy_bits(skb, 0, __skb_put(nskb, headlen), headlen);
+	BUG_ON(ret);
+
+	ninfo = skb_shinfo(nskb);
+	ninfo->gso_size = skb_shinfo(skb)->gso_size;
+	ninfo->gso_type = skb_shinfo(skb)->gso_type;
+
+	offset = headlen;
+	len = skb->len - headlen;
+
+	nskb->len = skb->len;
+	nskb->data_len = len;
+	nskb->truesize += len;
+
+	while (len) {
+		struct page *page;
+		int copy;
+		int zero;
+
+		if (unlikely(ninfo->nr_frags >= MAX_SKB_FRAGS)) {
+			dump_stack();
+			goto err_free;
+		}
+
+		copy = len >= PAGE_SIZE ? PAGE_SIZE : len;
+		zero = len >= PAGE_SIZE ? 0 : __GFP_ZERO;
+
+		page = alloc_page(GFP_ATOMIC | zero);
+		if (unlikely(!page))
+			goto err_free;
+
+		ret = skb_copy_bits(skb, offset, page_address(page), copy);
+		BUG_ON(ret);
+
+		ninfo->frags[ninfo->nr_frags].page = page;
+		ninfo->frags[ninfo->nr_frags].page_offset = 0;
+		ninfo->frags[ninfo->nr_frags].size = copy;
+		ninfo->nr_frags++;
+
+		offset += copy;
+		len -= copy;
+	}
+
+	offset = nskb->data - skb->data;
+
+	nskb->h.raw = skb->h.raw + offset;
+	nskb->nh.raw = skb->nh.raw + offset;
+	nskb->mac.raw = skb->mac.raw + offset;
+
+	return nskb;
+
+err_free:
+	kfree_skb(nskb);
+err:
+	return NULL;
+}
+
 static inline int netbk_queue_full(netif_t *netif)
 {
 	RING_IDX peek = netif->rx_req_cons_peek;
 
-	return netif->rx.sring->req_prod - peek <= 0 ||
-	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0;
+	return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS ||
+	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <+	      
MAX_SKB_FRAGS;
 }
 
 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -163,20 +249,12 @@ int netif_be_start_xmit(struct sk_buff *
 	 * We do not copy the packet unless:
 	 *  1. The data is shared; or
 	 *  2. The data is not allocated from our special cache.
-	 * NB. We also couldn''t cope with fragmented packets, but we
won''t get
-	 *     any because we not advertise the NETIF_F_SG feature.
+	 *  3. The data is fragmented.
 	 */
-	if (skb_shared(skb) || skb_cloned(skb) || !is_xen_skb(skb)) {
-		int hlen = skb->data - skb->head;
-		int ret;
-		struct sk_buff *nskb = dev_alloc_skb(hlen + skb->len);
+	if (skb_cloned(skb) || skb_is_nonlinear(skb) || !is_xen_skb(skb)) {
+		struct sk_buff *nskb = netbk_copy_skb(skb);
 		if ( unlikely(nskb == NULL) )
 			goto drop;
-		skb_reserve(nskb, hlen);
-		__skb_put(nskb, skb->len);
-		ret = skb_copy_bits(skb, -hlen, nskb->data - hlen,
-				     skb->len + hlen);
-		BUG_ON(ret);
 		/* Copy only the header fields we use in this driver. */
 		nskb->dev = skb->dev;
 		nskb->ip_summed = skb->ip_summed;
@@ -185,7 +263,7 @@ int netif_be_start_xmit(struct sk_buff *
 		skb = nskb;
 	}
 
-	netif->rx_req_cons_peek++;
+	netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1;
 	netif_get(netif);
 
 	if (netbk_can_queue(dev) && netbk_queue_full(netif))
@@ -221,116 +299,80 @@ int xen_network_done(void)
 }
 #endif
 
-static void net_rx_action(unsigned long unused)
-{
-	netif_t *netif = NULL; 
-	s8 status;
-	u16 size, id, irq, flags;
-	multicall_entry_t *mcl;
-	mmu_update_t *mmu;
-	gnttab_transfer_t *gop;
-	unsigned long vdata, old_mfn, new_mfn;
-	struct sk_buff_head rxq;
-	struct sk_buff *skb;
-	int notify_nr = 0;
-	int ret;
+static u16 netbk_gop_frag(netif_t *netif, struct page *page, int count, int i)
+{
+	multicall_entry_t *mcl = rx_mcl + count;
+	mmu_update_t *mmu = rx_mmu + count;
+	gnttab_transfer_t *gop = grant_rx_op + count;
+	netif_rx_request_t *req;
+	unsigned long old_mfn, new_mfn;
+
+	old_mfn = virt_to_mfn(page_address(page));
+
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		new_mfn = alloc_mfn();
+
+		/*
+		 * Set the new P2M table entry before reassigning
+		 * the old data page. Heed the comment in
+		 * pgtable-2level.h:pte_page(). :-)
+		 */
+		set_phys_to_machine(page_to_pfn(page), new_mfn);
+
+		MULTI_update_va_mapping(mcl, (unsigned long)page_address(page),
+					pfn_pte_ma(new_mfn, PAGE_KERNEL), 0);
+
+		mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
+			MMU_MACHPHYS_UPDATE;
+		mmu->val = page_to_pfn(page);
+	}
+
+	req = RING_GET_REQUEST(&netif->rx, netif->rx.req_cons + i);
+	gop->mfn = old_mfn;
+	gop->domid = netif->domid;
+	gop->ref = req->gref;
+	return req->id;
+}
+
+static void netbk_gop_skb(struct sk_buff *skb, struct netbk_rx_meta *meta,
+			  int count)
+{
+	netif_t *netif = netdev_priv(skb->dev);
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	int i;
+
+	for (i = 0; i < nr_frags; i++) {
+		meta[++count].frag = skb_shinfo(skb)->frags[i];
+		meta[count].id = netbk_gop_frag(netif, meta[count].frag.page,
+						count, i + 1);
+	}
+
 	/*
-	 * Putting hundreds of bytes on the stack is considered rude.
-	 * Static works because a tasklet can only be on one CPU at any time.
+	 * This must occur at the end to ensure that we don''t trash
+	 * skb_shinfo until we''re done.
 	 */
-	static u16 notify_list[NET_RX_RING_SIZE];
-
-	skb_queue_head_init(&rxq);
-
-	mcl = rx_mcl;
-	mmu = rx_mmu;
-	gop = grant_rx_op;
-
-	while ((skb = skb_dequeue(&rx_queue)) != NULL) {
-		netif   = netdev_priv(skb->dev);
-		vdata   = (unsigned long)skb->data;
-		old_mfn = virt_to_mfn(vdata);
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Memory squeeze? Back off for an arbitrary while. */
-			if ((new_mfn = alloc_mfn()) == 0) {
-				if ( net_ratelimit() )
-					WPRINTK("Memory squeeze in netback "
-						"driver.\n");
-				mod_timer(&net_timer, jiffies + HZ);
-				skb_queue_head(&rx_queue, skb);
-				break;
-			}
-			/*
-			 * Set the new P2M table entry before reassigning
-			 * the old data page. Heed the comment in
-			 * pgtable-2level.h:pte_page(). :-)
-			 */
-			set_phys_to_machine(
-				__pa(skb->data) >> PAGE_SHIFT,
-				new_mfn);
-
-			MULTI_update_va_mapping(mcl, vdata,
-						pfn_pte_ma(new_mfn,
-							   PAGE_KERNEL), 0);
-			mcl++;
-
-			mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
-				MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(vdata) >> PAGE_SHIFT;
-			mmu++;
-		}
-
-		gop->mfn = old_mfn;
-		gop->domid = netif->domid;
-		gop->ref = RING_GET_REQUEST(
-			&netif->rx, netif->rx.req_cons)->gref;
-		netif->rx.req_cons++;
-		gop++;
-
-		__skb_queue_tail(&rxq, skb);
-
-		/* Filled the batch queue? */
-		if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op))
-			break;
-	}
-
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		if (mcl == rx_mcl)
-			return;
-
-		mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
-
-		if (mmu - rx_mmu) {
-			mcl->op = __HYPERVISOR_mmu_update;
-			mcl->args[0] = (unsigned long)rx_mmu;
-			mcl->args[1] = mmu - rx_mmu;
-			mcl->args[2] = 0;
-			mcl->args[3] = DOMID_SELF;
-			mcl++;
-		}
-
-		ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
-		BUG_ON(ret != 0);
-	}
-
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, 
-					gop - grant_rx_op);
-	BUG_ON(ret != 0);
-
-	mcl = rx_mcl;
-	gop = grant_rx_op;
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-		netif   = netdev_priv(skb->dev);
-		size    = skb->tail - skb->data;
-
-		atomic_set(&(skb_shinfo(skb)->dataref), 1);
-		skb_shinfo(skb)->nr_frags = 0;
-		skb_shinfo(skb)->frag_list = NULL;
-
-		netif->stats.tx_bytes += size;
-		netif->stats.tx_packets++;
-
+	meta[count - nr_frags].id = netbk_gop_frag(netif,
+						   virt_to_page(skb->data),
+						   count - nr_frags, 0);
+	netif->rx.req_cons += nr_frags + 1;
+}
+
+static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta)
+{
+	int i;
+
+	for (i = 0; i < nr_frags; i++)
+		put_page(meta[i].frag.page);
+}
+
+static int netbk_check_gop(int nr_frags, domid_t domid, int count)
+{
+	multicall_entry_t *mcl = rx_mcl + count;
+	gnttab_transfer_t *gop = grant_rx_op + count;
+	int status = NETIF_RSP_OKAY;
+	int i;
+
+	for (i = 0; i <= nr_frags; i++) {
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			/* The update_va_mapping() must not fail. */
 			BUG_ON(mcl->result != 0);
@@ -338,10 +380,9 @@ static void net_rx_action(unsigned long 
 		}
 
 		/* Check the reassignment error code. */
-		status = NETIF_RSP_OKAY;
 		if (gop->status != 0) { 
 			DPRINTK("Bad status %d from grant transfer to DOM%u\n",
-				gop->status, netif->domid);
+				gop->status, domid);
 			/*
 			 * Page no longer belongs to us unless GNTST_bad_page,
 			 * but that should be a fatal error anyway.
@@ -349,17 +390,128 @@ static void net_rx_action(unsigned long 
 			BUG_ON(gop->status == GNTST_bad_page);
 			status = NETIF_RSP_ERROR; 
 		}
-		irq = netif->irq;
-		id = RING_GET_REQUEST(&netif->rx, netif->rx.rsp_prod_pvt)->id;
-		flags = 0;
+		gop++;
+	}
+
+	return status;
+}
+
+static void netbk_add_frag_responses(netif_t *netif, int status,
+				     struct netbk_rx_meta *meta, int nr_frags)
+{
+	int i;
+
+	for (i = 0; i < nr_frags; i++) {
+		int id = meta[i].id;
+		int flags = (i == nr_frags - 1) ? 0 : NETRXF_more_data;
+
+		make_rx_response(netif, id, status, meta[i].frag.page_offset,
+				 meta[i].frag.size, flags);
+	}
+}
+
+static void net_rx_action(unsigned long unused)
+{
+	netif_t *netif = NULL; 
+	s8 status;
+	u16 id, irq, flags;
+	multicall_entry_t *mcl;
+	struct sk_buff_head rxq;
+	struct sk_buff *skb;
+	int notify_nr = 0;
+	int ret;
+	int nr_frags;
+	int count;
+
+	/*
+	 * Putting hundreds of bytes on the stack is considered rude.
+	 * Static works because a tasklet can only be on one CPU at any time.
+	 */
+	static u16 notify_list[NET_RX_RING_SIZE];
+	static struct netbk_rx_meta meta[NET_RX_RING_SIZE];
+
+	skb_queue_head_init(&rxq);
+
+	count = 0;
+
+	while ((skb = skb_dequeue(&rx_queue)) != NULL) {
+		nr_frags = skb_shinfo(skb)->nr_frags;
+		*(int *)skb->cb = nr_frags;
+
+		if (!xen_feature(XENFEAT_auto_translated_physmap) &&
+		    check_mfn(nr_frags + 1)) {
+			/* Memory squeeze? Back off for an arbitrary while. */
+			if ( net_ratelimit() )
+				WPRINTK("Memory squeeze in netback "
+					"driver.\n");
+			mod_timer(&net_timer, jiffies + HZ);
+			skb_queue_head(&rx_queue, skb);
+			break;
+		}
+
+		netbk_gop_skb(skb, meta, count);
+
+		count += nr_frags + 1;
+
+		__skb_queue_tail(&rxq, skb);
+
+		/* Filled the batch queue? */
+		if (count + MAX_SKB_FRAGS >= NET_RX_RING_SIZE)
+			break;
+	}
+
+	if (!count)
+		return;
+
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		mcl = rx_mcl + count;
+
+		mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
+
+		mcl->op = __HYPERVISOR_mmu_update;
+		mcl->args[0] = (unsigned long)rx_mmu;
+		mcl->args[1] = count;
+		mcl->args[2] = 0;
+		mcl->args[3] = DOMID_SELF;
+
+		ret = HYPERVISOR_multicall(rx_mcl, count + 1);
+		BUG_ON(ret != 0);
+	}
+
+	ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, count);
+	BUG_ON(ret != 0);
+
+	count = 0;
+	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+		nr_frags = *(int *)skb->cb;
+
+		atomic_set(&(skb_shinfo(skb)->dataref), 1);
+		skb_shinfo(skb)->nr_frags = 0;
+		skb_shinfo(skb)->frag_list = NULL;
+
+		netif = netdev_priv(skb->dev);
+		netif->stats.tx_bytes += skb->len;
+		netif->stats.tx_packets++;
+
+		netbk_free_pages(nr_frags, meta + count + 1);
+		status = netbk_check_gop(nr_frags, netif->domid, count);
+
+		id = meta[count].id;
+		flags = nr_frags ? NETRXF_more_data : 0;
+
 		if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 			flags |= NETRXF_csum_blank | NETRXF_data_validated;
 		else if (skb->proto_data_valid) /* remote but checksummed? */
 			flags |= NETRXF_data_validated;
-		if (make_rx_response(netif, id, status,
-				     (unsigned long)skb->data & ~PAGE_MASK,
-				     size, flags) &&
-		    (rx_notify[irq] == 0)) {
+
+		make_rx_response(netif, id, status, offset_in_page(skb->data),
+				 skb_headlen(skb), flags);
+		netbk_add_frag_responses(netif, status, meta + count + 1,
+					 nr_frags);
+
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, ret);
+		irq = netif->irq;
+		if (ret && !rx_notify[irq]) {
 			rx_notify[irq] = 1;
 			notify_list[notify_nr++] = irq;
 		}
@@ -370,7 +522,7 @@ static void net_rx_action(unsigned long 
 
 		netif_put(netif);
 		dev_kfree_skb(skb);
-		gop++;
+		count += nr_frags + 1;
 	}
 
 	while (notify_nr != 0) {
@@ -1039,7 +1191,6 @@ static int make_rx_response(netif_t *net
 {
 	RING_IDX i = netif->rx.rsp_prod_pvt;
 	netif_rx_response_t *resp;
-	int notify;
 
 	resp = RING_GET_RESPONSE(&netif->rx, i);
 	resp->offset     = offset;
@@ -1050,9 +1201,8 @@ static int make_rx_response(netif_t *net
 		resp->status = (s16)st;
 
 	netif->rx.rsp_prod_pvt = ++i;
-	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, notify);
-
-	return notify;
+
+	return 0;
 }
 
 #ifdef NETBE_DEBUG_INTERRUPT
diff -r c82398c08669 -r 54f84d198a1d
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 07 23:37:45 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 07 23:38:49 2006
+1000
@@ -377,6 +377,13 @@ static int connect_rings(struct backend_
 		/* Must be non-zero for pfifo_fast to work. */
 		be->netif->dev->tx_queue_len = 1;
 
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg",
"%d", &val) < 0)
+		val = 0;
+	if (val) {
+		be->netif->features |= NETIF_F_SG;
+		be->netif->dev->features |= NETIF_F_SG;
+	}
+
 	/* Map the shared frame, irq etc. */
 	err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn);
 	if (err) {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:21 UTC

head link

[Xen-devel] [10/11] [NET] front: Add TSO support

Hi:

[NET] front: Add TSO support

This patch adds TCP Segmentation Offload (TSO) support to the frontend.
It also advertises this fact through xenbus so that the frontend can
detect this and send through TSO requests only if it is supported.

This is done using an extra request slot which is indicated by a flag
in the first slot.  In future checksum offload can be done in the same
way.

Even though only TSO is supported for now the code actually supports
GSO so it can be applied to any other protocol.  The only missing bit
is the detection of host support for a specific GSO protocol.  Once that
is added we can advertise all supported protocols to the guest.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 54f84d198a1d -r 5861968091dd
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:38:49
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 07 23:38:54
2006 +1000
@@ -116,6 +116,11 @@ struct netfront_info {
 	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
 };
 
+struct netfront_rx_info {
+	struct netif_rx_response rx;
+	struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1];
+};
+
 /*
  * Access macros for acquiring freeing slots in tx_skbs[].
  */
@@ -330,6 +335,12 @@ again:
 	err = xenbus_printf(xbt, dev->nodename, "feature-sg",
"%d", 1);
 	if (err) {
 		message = "writing feature-sg";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4",
"%d", 1);
+	if (err) {
+		message = "writing feature-gso-tcpv4";
 		goto abort_transaction;
 	}
 
@@ -898,18 +909,65 @@ static void xennet_move_rx_slot(struct n
 	np->rx.req_prod_pvt++;
 }
 
+int xennet_get_extras(struct netfront_info *np,
+		      struct netif_extra_info *extras, RING_IDX rp)
+
+{
+	struct netif_extra_info *extra;
+	RING_IDX cons = np->rx.rsp_cons;
+	int err = 0;
+
+	do {
+		struct sk_buff *skb;
+		grant_ref_t ref;
+
+		if (unlikely(cons + 1 == rp)) {
+			if (net_ratelimit())
+				WPRINTK("Missing extra info\n");
+			err = -EBADR;
+			break;
+		}
+
+		extra = (struct netif_extra_info *)
+			RING_GET_RESPONSE(&np->rx, ++cons);
+
+		if (unlikely(!extra->type ||
+			     extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
+			if (net_ratelimit())
+				WPRINTK("Invalid extra type: %d\n",
+					extra->type);
+			err = -EINVAL;
+		} else
+			memcpy(&extras[extra->type - 1], extra, sizeof(*extra));
+
+		skb = xennet_get_rx_skb(np, cons);
+		ref = xennet_get_rx_ref(np, cons);
+		xennet_move_rx_slot(np, skb, ref);
+	} while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
+
+	np->rx.rsp_cons = cons;
+	return err;
+}
+
 static int xennet_get_responses(struct netfront_info *np,
-				struct netif_rx_response *rx, RING_IDX rp,
+				struct netfront_rx_info *rinfo, RING_IDX rp,
 				struct sk_buff_head *list, int count)
 {
 	struct mmu_update *mmu = np->rx_mmu + count;
 	struct multicall_entry *mcl = np->rx_mcl + count;
+	struct netif_rx_response *rx = &rinfo->rx;
+	struct netif_extra_info *extras = rinfo->extras;
 	RING_IDX cons = np->rx.rsp_cons;
 	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
 	grant_ref_t ref = xennet_get_rx_ref(np, cons);
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int frags = 1;
 	int err = 0;
+
+	if (rx->flags & NETRXF_extra_info) {
+		err = xennet_get_extras(np, extras, rp);
+		cons = np->rx.rsp_cons;
+	}
 
 	for (;;) {
 		unsigned long mfn;
@@ -1022,11 +1080,38 @@ static RING_IDX xennet_fill_frags(struct
 	return cons;
 }
 
+static int xennet_set_skb_gso(struct sk_buff *skb, struct netif_extra_info
*gso)
+{
+	if (!gso->u.gso.size) {
+		if (net_ratelimit())
+			WPRINTK("GSO size must not be zero.\n");
+		return -EINVAL;
+	}
+
+	/* Currently only TCPv4 S.O. is supported. */
+	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+		if (net_ratelimit())
+			WPRINTK("Bad GSO type %d.\n", gso->u.gso.type);
+		return -EINVAL;
+	}
+
+	skb_shinfo(skb)->gso_size = gso->u.gso.size;
+	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+
+	/* Header must be checked, and gso_segs computed. */
+	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
+	skb_shinfo(skb)->gso_segs = 0;
+
+	return 0;
+}
+
 static int netif_poll(struct net_device *dev, int *pbudget)
 {
 	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
-	struct netif_rx_response *rx;
+	struct netfront_rx_info rinfo;
+	struct netif_rx_response *rx = &rinfo.rx;
+	struct netif_extra_info *extras = rinfo.extras;
 	RING_IDX i, rp;
 	struct multicall_entry *mcl;
 	int work_done, budget, more_to_do = 1;
@@ -1057,12 +1142,14 @@ static int netif_poll(struct net_device 
 	for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0;
 	     (i != rp) && (work_done < budget);
 	     np->rx.rsp_cons = ++i, work_done++) {
-		rx = RING_GET_RESPONSE(&np->rx, i);
-
-		err = xennet_get_responses(np, rx, rp, &tmpq, pages_done);
+		memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx));
+		memset(extras, 0, sizeof(extras));
+
+		err = xennet_get_responses(np, &rinfo, rp, &tmpq, pages_done);
 		pages_done += skb_queue_len(&tmpq);
 
 		if (unlikely(err)) {
+err:
 			i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1;
 			work_done--;
 			while ((skb = __skb_dequeue(&tmpq)))
@@ -1072,6 +1159,16 @@ static int netif_poll(struct net_device 
 		}
 
 		skb = __skb_dequeue(&tmpq);
+
+		if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) {
+			struct netif_extra_info *gso;
+			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
+
+			if (unlikely(xennet_set_skb_gso(skb, gso))) {
+				__skb_queue_head(&tmpq, skb);
+				goto err;
+			}
+		}
 
 		skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page;
 		skb->h.raw = skb->nh.raw + rx->offset;
diff -r 54f84d198a1d -r 5861968091dd xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Fri Jul 07 23:38:49 2006 +1000
+++ b/xen/include/public/io/netif.h	Fri Jul 07 23:38:54 2006 +1000
@@ -127,6 +127,10 @@ typedef struct netif_rx_request netif_rx
 #define _NETRXF_more_data      (2)
 #define  NETRXF_more_data      (1U<<_NETRXF_more_data)
 
+/* Packet to be followed by extra descriptor(s). */
+#define _NETRXF_extra_info     (3)
+#define  NETRXF_extra_info     (1U<<_NETRXF_extra_info)
+
 struct netif_rx_response {
     uint16_t id;
     uint16_t offset;       /* Offset in page of start of received packet  */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-07 14:22 UTC

head link

[Xen-devel] [11/11] [NET] back: Transmit TSO packets if supported

Hi:

[NET] back: Transmit TSO packets if supported

This patch adds TSO transmission support to the backend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 5861968091dd -r 51252bc644da
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:38:54
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 07 23:38:57
2006 +1000
@@ -90,12 +90,26 @@ static int netbk_set_sg(struct net_devic
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int netbk_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		netif_t *netif = netdev_priv(dev);
+
+		if (!(netif->features & NETIF_F_TSO))
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
 static struct ethtool_ops network_ethtool_ops  {
 	.get_tx_csum = ethtool_op_get_tx_csum,
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = netbk_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = netbk_set_tso,
 };
 
 netif_t *netif_alloc(domid_t domid, unsigned int handle, u8 be_mac[ETH_ALEN])
diff -r 5861968091dd -r 51252bc644da
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:38:54
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 07 23:38:57
2006 +1000
@@ -50,12 +50,12 @@ static void make_tx_response(netif_t *ne
 static void make_tx_response(netif_t *netif, 
 			     netif_tx_request_t *txp,
 			     s8       st);
-static int  make_rx_response(netif_t *netif, 
-			     u16      id, 
-			     s8       st,
-			     u16      offset,
-			     u16      size,
-			     u16      flags);
+static netif_rx_response_t *make_rx_response(netif_t *netif, 
+					     u16      id, 
+					     s8       st,
+					     u16      offset,
+					     u16      size,
+					     u16      flags);
 
 static void net_tx_action(unsigned long unused);
 static DECLARE_TASKLET(net_tx_tasklet, net_tx_action, 0);
@@ -225,9 +225,9 @@ static inline int netbk_queue_full(netif
 {
 	RING_IDX peek = netif->rx_req_cons_peek;
 
-	return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS ||
+	return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS + 1 ||
 	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <-	      
MAX_SKB_FRAGS;
+	       MAX_SKB_FRAGS + 1;
 }
 
 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -263,7 +263,8 @@ int netif_be_start_xmit(struct sk_buff *
 		skb = nskb;
 	}
 
-	netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1;
+	netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1 +
+				   !!skb_shinfo(skb)->gso_size;
 	netif_get(netif);
 
 	if (netbk_can_queue(dev) && netbk_queue_full(netif))
@@ -340,11 +341,16 @@ static void netbk_gop_skb(struct sk_buff
 	netif_t *netif = netdev_priv(skb->dev);
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 	int i;
+	int extra;
+
+	meta[count].frag.page_offset = skb_shinfo(skb)->gso_type;
+	meta[count].frag.size = skb_shinfo(skb)->gso_size;
+	extra = !!meta[count].frag.size + 1;
 
 	for (i = 0; i < nr_frags; i++) {
 		meta[++count].frag = skb_shinfo(skb)->frags[i];
 		meta[count].id = netbk_gop_frag(netif, meta[count].frag.page,
-						count, i + 1);
+						count, i + extra);
 	}
 
 	/*
@@ -354,7 +360,7 @@ static void netbk_gop_skb(struct sk_buff
 	meta[count - nr_frags].id = netbk_gop_frag(netif,
 						   virt_to_page(skb->data),
 						   count - nr_frags, 0);
-	netif->rx.req_cons += nr_frags + 1;
+	netif->rx.req_cons += nr_frags + extra;
 }
 
 static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta)
@@ -415,6 +421,8 @@ static void net_rx_action(unsigned long 
 	netif_t *netif = NULL; 
 	s8 status;
 	u16 id, irq, flags;
+	netif_rx_response_t *resp;
+	struct netif_extra_info *extra;
 	multicall_entry_t *mcl;
 	struct sk_buff_head rxq;
 	struct sk_buff *skb;
@@ -504,8 +512,33 @@ static void net_rx_action(unsigned long 
 		else if (skb->proto_data_valid) /* remote but checksummed? */
 			flags |= NETRXF_data_validated;
 
-		make_rx_response(netif, id, status, offset_in_page(skb->data),
-				 skb_headlen(skb), flags);
+		resp = make_rx_response(netif, id, status,
+					offset_in_page(skb->data),
+					skb_headlen(skb), flags);
+
+		extra = NULL;
+
+		if (meta[count].frag.size) {
+			struct netif_extra_info *gso +				(struct netif_extra_info *)
+				RING_GET_RESPONSE(&netif->rx,
+						  netif->rx.rsp_prod_pvt++);
+
+			if (extra)
+				extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
+			else
+				resp->flags |= NETRXF_extra_info;
+
+			gso->u.gso.size = meta[count].frag.size;
+			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
+			gso->u.gso.pad = 0;
+			gso->u.gso.features = 0;
+
+			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
+			gso->flags = 0;
+			extra = gso;
+		}
+
 		netbk_add_frag_responses(netif, status, meta + count + 1,
 					 nr_frags);
 
@@ -1182,12 +1215,12 @@ static void make_tx_response(netif_t *ne
 #endif
 }
 
-static int make_rx_response(netif_t *netif, 
-			    u16      id, 
-			    s8       st,
-			    u16      offset,
-			    u16      size,
-			    u16      flags)
+static netif_rx_response_t *make_rx_response(netif_t *netif, 
+					     u16      id, 
+					     s8       st,
+					     u16      offset,
+					     u16      size,
+					     u16      flags)
 {
 	RING_IDX i = netif->rx.rsp_prod_pvt;
 	netif_rx_response_t *resp;
@@ -1202,7 +1235,7 @@ static int make_rx_response(netif_t *net
 
 	netif->rx.rsp_prod_pvt = ++i;
 
-	return 0;
+	return resp;
 }
 
 #ifdef NETBE_DEBUG_INTERRUPT
diff -r 5861968091dd -r 51252bc644da
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 07 23:38:54 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 07 23:38:57 2006
+1000
@@ -384,6 +384,14 @@ static int connect_rings(struct backend_
 		be->netif->dev->features |= NETIF_F_SG;
 	}
 
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
"%d",
+			 &val) < 0)
+		val = 0;
+	if (val) {
+		be->netif->features |= NETIF_F_TSO;
+		be->netif->dev->features |= NETIF_F_TSO;
+	}
+
 	/* Map the shared frame, irq etc. */
 	err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn);
 	if (err) {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-27 12:49 UTC

head link

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

Some comments (inline) on this patch. I checked in the trivial first 
two patches.

  Thanks,
  Keir

On 7 Jul 2006, at 15:18, Herbert Xu wrote:
>  	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
> -	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
> -		if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
> -			continue;
> +	for (i = 0; i < NET_RX_RING_SIZE; i++) {
> +		if (!np->rx_skbs[i])
> +			break;
>  		gnttab_grant_foreign_transfer_ref(
>  			np->grant_rx_ref[i], np->xbdev->otherend_id,
>  			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
> -		RING_GET_REQUEST(&np->rx, requeue_idx)->gref > -		
np->grant_rx_ref[i];
> -		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
> +		RING_GET_REQUEST(&np->rx, i)->gref = np->grant_rx_ref[i];
> +		RING_GET_REQUEST(&np->rx, i)->id = i;
> +	}
> +	for (requeue_idx = i++; i < NET_RX_RING_SIZE; i++) {
> +		if (!np->rx_skbs[i])
> +			continue;
> +		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
> +		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
> +		gnttab_grant_foreign_transfer_ref(
> +			ref, np->xbdev->otherend_id,
> +			__pa(skb->data) >> PAGE_SHIFT);
> +		RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref;
> +		RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx;
>  		requeue_idx++;
>  	}
Why two loops to fill the receive ring? The header of the second loop 
with the body of the first loop would seem more correct (use of 
destructive xennet_get_* functions to read skb/ref info looks 
incorrect). Have you tested this patch with ''xm save/restore''
on a domU
receiving a bulk network transfer? That would be very worthwhile.
> @@ -1391,11 +1433,6 @@ static struct net_device * __devinit cre
>  		np->grant_tx_ref[i] = GRANT_INVALID_REF;
>  	}
>
> -	for (i = 0; i <= NET_RX_RING_SIZE; i++) {
> -		np->rx_skbs[i] = (void *)((unsigned long) i+1);
> -		np->grant_rx_ref[i] = GRANT_INVALID_REF;
> -	}
> -
>  	/* A grant for every tx ring slot */
>  	if (gnttab_alloc_grant_references(TX_MAX_TARGET,
>  					  &np->gref_tx_head) < 0) {
Initialisation still would be nice for sanity. We should initialise 
rx_skbs[i] to NULL, but apart from that I think we should keep this 
initialisation loop as-is. Oh, except < NET_RX_RING_SIZE rather than 
<=.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-27 13:11 UTC

head link

[Xen-devel] Re: [5/11] [NET] back: Replace netif->active with netif_carrier_ok

On 7 Jul 2006, at 15:19, Herbert Xu wrote:
> [NET] back: Replace netif->active with netif_carrier_ok
>
> The connection status to the frontend can be represented using
> netif_carrier_ok instead of netif->active.  As a result, we delay
> the construction of the dev qdisc until the carrier comes on.  This
> is a prerequisite for adding a tx queue.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
netif->active is also conditional on whether or not the interface is 
open (''running''). So it''s not as good a fit for
netif_carrier as the
status field you similarly replaced in netfront.

Perhaps you could replace the netback status field 
(CONNECTED/DISCONNECTED) instead? I think the extra value DISCONNECTING 
is not really necessary any more, so it should map onto the boolean 
netif_carrier okay.

Then ->active could also potentially be removed. It would correspond to 
netif_running&&netif_carrier_ok.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-27 13:12 UTC

head link

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

On Thu, Jul 27, 2006 at 01:49:51PM +0100, Keir Fraser
wrote:> 
> Some comments (inline) on this patch. I checked in the trivial first 
> two patches.
Thanks Keir.
> On 7 Jul 2006, at 15:18, Herbert Xu wrote:
> 
> > 	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
> >-	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
> >-		if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
> >-			continue;
> >+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
> >+		if (!np->rx_skbs[i])
> >+			break;
> > 		gnttab_grant_foreign_transfer_ref(
> > 			np->grant_rx_ref[i], np->xbdev->otherend_id,
> > 			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
> >-		RING_GET_REQUEST(&np->rx, requeue_idx)->gref > >-		
np->grant_rx_ref[i];
> >-		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
> >+		RING_GET_REQUEST(&np->rx, i)->gref =
np->grant_rx_ref[i];
> >+		RING_GET_REQUEST(&np->rx, i)->id = i;
> >+	}
> >+	for (requeue_idx = i++; i < NET_RX_RING_SIZE; i++) {
> >+		if (!np->rx_skbs[i])
> >+			continue;
> >+		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
> >+		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, 
> >i);
> >+		gnttab_grant_foreign_transfer_ref(
> >+			ref, np->xbdev->otherend_id,
> >+			__pa(skb->data) >> PAGE_SHIFT);
> >+		RING_GET_REQUEST(&np->rx, requeue_idx)->gref = ref;
> >+		RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx;
> > 		requeue_idx++;
> > 	}
> 
> Why two loops to fill the receive ring? The header of the second loop 
> with the body of the first loop would seem more correct (use of 
The first loop deals with the initial segment of np where rx_skbs isn''t
NULL.  The reason it''s dealt with separately from the rest is because
unlike the second loop we don''t relocate the entry in np->rx_skbs
and
np->grant_rx_ref.

Had we used the body of the second loop for the initial segment we would
end up wiping out the values of rx_skbs and grant_rx_ref when we do the
assignments

		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);

because for the initial segment requeue_idx is always the same as i.
> destructive xennet_get_* functions to read skb/ref info looks 
> incorrect).
We''re relocating entries from the back of rx_skbs/grant_rx_ref to the
front.  After relocating it we need to clear the original position as
otherwise we may use the entry incorrectly if the backend gives us a
bogus response.  By clearing it we ensure that it is caught by the warning
"Bad rx response id %d.\n".  This is also needed for the sanity checks
on rx_skbs before we store a new skb into it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-27 13:15 UTC

head link

[Xen-devel] Re: [7/11] [NET] back: Added tx queue

On 7 Jul 2006, at 15:19, Herbert Xu wrote:
> ET] back: Added tx queue
>
> This patch adds a tx queue to the backend if the frontend supports rx
> refill notification.  A queue is needed because SG/TSO greatly reduces
> the number of packets that can be stored in the rx ring.  Given an rx
> ring with 256 entries, a maximum TSO packet can occupy as many as 18
> entries, meaning that the entire ring can only hold 14 packets.  This
> is too small at high bandwidths with large TCP RX windows.
>
> Having a tx queue does not present a new security risk as the queue is
> a fixed size buffer just like rx ring.  So each guest can only hold a
> fixed amount of memory (proportional to the tx queue length) on the 
> host.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
What''s the default queue size? It shouldn''t be too large if
each packet
can be 64kB!

Also, what does NETIF_F_LLTX have to do with whether we have a queue or 
not -- couldn''t we set that all the time? And you set queue length to 1
in the other case -- what''s wrong with zero? Seems a saner value when 
there is no queue: or do we need to commit to having *some* queue at 
register_netdevice() time, and there''s no way to go back from that by 
the time we find out if netfront supports rx refill notifications (so 
at that point we cannot change queue len to zero)?

  -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-27 13:23 UTC

head link

[Xen-devel] Re: [7/11] [NET] back: Added tx queue

On Thu, Jul 27, 2006 at 02:15:29PM +0100, Keir Fraser
wrote:> 
> What''s the default queue size? It shouldn''t be too large
if each packet
> can be 64kB!
The default is 1000.  If 64MB is too much for each domain, then how about
100?
> Also, what does NETIF_F_LLTX have to do with whether we have a queue or 
> not -- couldn''t we set that all the time? And you set queue length
to 1
Sorry, should''ve add a comment about that.  NETIF_F_LLTX is just an
unused
bit from the features set to indicate the fact that queueing is supported.
I suppose I could add a new flag too.
> in the other case -- what''s wrong with zero? Seems a saner value
when
> there is no queue: or do we need to commit to having *some* queue at 
> register_netdevice() time, and there''s no way to go back from that
by
> the time we find out if netfront supports rx refill notifications (so 
> at that point we cannot change queue len to zero)?
Exactly.  Once we''ve committed to having a queue, setting the queue
length to zero will cause all packets to be dropped.  I even added
a comment about that :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-27 13:28 UTC

head link

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

On 27 Jul 2006, at 14:12, Herbert Xu wrote:
> Had we used the body of the second loop for the initial segment we 
> would
> end up wiping out the values of rx_skbs and grant_rx_ref when we do the
> assignments
>
> 		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
> 		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i);
>
> because for the initial segment requeue_idx is always the same as i.
But wouldn''t the xennet_get_* functions wipe the entries, only to be 
immediately filled in again by the assignment to e.g., 
np->rx_skbs[requeue_idx], so it would all work out okay (except a 
needless wipe-then-rewrite)?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-27 13:30 UTC

head link

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

On Thu, Jul 27, 2006 at 02:28:48PM +0100, Keir Fraser
wrote:> 
> >Had we used the body of the second loop for the initial segment we 
> >would
> >end up wiping out the values of rx_skbs and grant_rx_ref when we do the
> >assignments
> >
> >		skb = np->rx_skbs[requeue_idx] = xennet_get_rx_skb(np, i);
> >		ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, 
> >		i);
> >
> >because for the initial segment requeue_idx is always the same as i.
> 
> But wouldn''t the xennet_get_* functions wipe the entries, only to
be
> immediately filled in again by the assignment to e.g., 
> np->rx_skbs[requeue_idx], so it would all work out okay (except a 
> needless wipe-then-rewrite)?
That was the reason of having two loops instead of the one.  In the first
loop, we''re not moving entries so we don''t assign
rx_skbs/grant_rx_ref
at all.  In the second loop, we know that requeue_idx != i so we''re
always moving entries forward in which case clearing the original position
is necessary.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-27 13:32 UTC

head link

[Xen-devel] Re: [7/11] [NET] back: Added tx queue

On 27 Jul 2006, at 14:23, Herbert Xu wrote:
>> Also, what does NETIF_F_LLTX have to do with whether we have a queue 
>> or
>> not -- couldn''t we set that all the time? And you set queue
length to
>> 1
>
> Sorry, should''ve add a comment about that.  NETIF_F_LLTX is just
an
> unused
> bit from the features set to indicate the fact that queueing is 
> supported.
> I suppose I could add a new flag too.
That''s pretty gross. NETIF_F_LLTX is used isn''t it? To
indicate whether
an interface does its own locking in start_xmit()?

Why not have a private feature flag in an appropriate netback-private 
structure?
>> in the other case -- what''s wrong with zero? Seems a saner
value when
>> there is no queue: or do we need to commit to having *some* queue at
>> register_netdevice() time, and there''s no way to go back from
that by
>> the time we find out if netfront supports rx refill notifications (so
>> at that point we cannot change queue len to zero)?
>
> Exactly.  Once we''ve committed to having a queue, setting the
queue
> length to zero will cause all packets to be dropped.  I even added
> a comment about that :)
Ok, sounds reasonable then.

But I would change queue length to 100 in your next spin of this patch. 
1000 is scary big.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jul-27 13:34 UTC

head link

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

On 27 Jul 2006, at 14:30, Herbert Xu wrote:
>> But wouldn''t the xennet_get_* functions wipe the entries, only
to be
>> immediately filled in again by the assignment to e.g.,
>> np->rx_skbs[requeue_idx], so it would all work out okay (except a
>> needless wipe-then-rewrite)?
>
> That was the reason of having two loops instead of the one.  In the 
> first
> loop, we''re not moving entries so we don''t assign
rx_skbs/grant_rx_ref
> at all.  In the second loop, we know that requeue_idx != i so
we''re
> always moving entries forward in which case clearing the original 
> position
> is necessary.
It''s not a time critical piece of code. I''d much rather have a
single
loop that wastes some small amount of time. That routine is confusing 
enough as it is. :-)

Also, what about putting the array-initialising loop back: reasonable?

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:05 UTC

head link

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

On Thu, Jul 27, 2006 at 02:34:41PM +0100, Keir Fraser
wrote:> 
> It''s not a time critical piece of code. I''d much rather
have a single
> loop that wastes some small amount of time. That routine is confusing 
> enough as it is. :-)
Actually I split it into two loops for clarity :) I suppose now that
I''ve merged it back into one it doesn''t look to bad.
> Also, what about putting the array-initialising loop back: reasonable?
Consider it done.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:06 UTC

head link

[Xen-devel] Re: [5/11] [NET] back: Replace netif->active with netif_carrier_ok

On Thu, Jul 27, 2006 at 02:11:01PM +0100, Keir Fraser
wrote:> 
> netif->active is also conditional on whether or not the interface is 
> open (''running''). So it''s not as good a fit for
netif_carrier as the
> status field you similarly replaced in netfront.
> 
> Perhaps you could replace the netback status field 
> (CONNECTED/DISCONNECTED) instead? I think the extra value DISCONNECTING 
> is not really necessary any more, so it should map onto the boolean 
> netif_carrier okay.
> 
> Then ->active could also potentially be removed. It would correspond to 
> netif_running&&netif_carrier_ok.
Good point.  I''ve made the carrier correspond to CONNECTED/DISCONNECTED
and added the netif_running tests.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2006 - [0/11] [NET]: Add TSO support for dom0=>domU

[Xen-devel] [0/11] [NET]: Add TSO support for dom0=>domU

[Xen-devel] [1/11] [NET] front: Zero gso features/pad

[Xen-devel] [2/11] [NET] back: Make use of the simplicity of tasklets in net_rx_action

[Xen-devel] [3/11] [NET] front: Stop using rx->id

[Xen-devel] [4/11] [NET] front: Move rx req pushing to one spot

[Xen-devel] [5/11] [NET] back: Replace netif->active with netif_carrier_ok

[Xen-devel] [6/11] [NET] front: Added feature-rx-notify

[Xen-devel] [7/11] [NET] back: Added tx queue

[Xen-devel] [8/11] [NET] front: Add SG support

[Xen-devel] [9/11] [NET] back: Transmit SG packets if supported

[Xen-devel] [10/11] [NET] front: Add TSO support

[Xen-devel] [11/11] [NET] back: Transmit TSO packets if supported

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

[Xen-devel] Re: [5/11] [NET] back: Replace netif->active with netif_carrier_ok

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

[Xen-devel] Re: [7/11] [NET] back: Added tx queue

[Xen-devel] Re: [7/11] [NET] back: Added tx queue

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

[Xen-devel] Re: [7/11] [NET] back: Added tx queue

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

[Xen-devel] Re: [3/11] [NET] front: Stop using rx->id

[Xen-devel] Re: [5/11] [NET] back: Replace netif->active with netif_carrier_ok