thr3ads.net - Xen devel - [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Herbert Xu

2006-Jul-28 08:42 UTC

[Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU

Hi Keir:

This is a respin of the dom0=>domU TSO patch series.

Original description:

This series of patches adds SG and TSO support for the dom0=>domU
direction.  This completes the domU=>domU loop which means that we
have TSO all the way through.

It''s a bit late here so I haven''t done a full set of
benchmarks.
However, these two figures should be indicative for domU=>domU:

TSO off:    323.91Mb/s
TSO on:    1678.52Mb/s
lo(16436): 5533.93Mb/s

Of course you''ll need to turn the 2 #if''s back on for
domU=>dom0 to
do this.  I''ve also found out the reason I was getting smaller MSSes
before.  The default write buffer size is too small for these speeds
in Linux.  So you need to raise it by something like:

echo 526808 > /proc/sys/net/core/wmem_max
netperf -t TCP_STREAM -H x.x.x.x -- -s 263404

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:43 UTC

head link

[Xen-devel] [1/9] [NET] front: Stop using rx->id

Hi:

[NET] front: Stop using rx->id

With the current protocol for transferring packets from dom0 to domU, the
rx->id field is useless because it can be derived from the rx request ring
ID.  In particular,

	rx->id = (ring_id & NET_RX_RING_SIZE - 1) + 1;

This formula works because the rx response to each request always occupies
the same slot that the request arrived in.  This in turn is a consequence
of the fact that each packet only occupies one slot.

The other important reason that this works for dom0=>domU but not
domU=>dom0
is that the resource associated with the rx->id is freed immediately while
in the domU=>dom0 case the resource is held until the skb is liberated by
dom0.

Using this formula we can essentially remove rx->id from the protocol,
freeing up space that could be instead be used by things like TSO.  The
only constraint is that the backend must obey the rule that each id must
only be used in the response that occupies the same slot as the request.

The actual field of rx->id is still maintained for compatibility with
older backends.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 5848356af8da -r 65ff65fb76ec
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Thu Jul 27 14:06:15
2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 17:51:16
2006 +1000
@@ -99,17 +99,17 @@ struct netfront_info {
 	struct timer_list rx_refill_timer;
 
 	/*
-	 * {tx,rx}_skbs store outstanding skbuffs. The first entry in each
-	 * array is an index into a chain of free entries.
+	 * {tx,rx}_skbs store outstanding skbuffs. The first entry in tx_skbs
+	 * is an index into a chain of free entries.
 	 */
 	struct sk_buff *tx_skbs[NET_TX_RING_SIZE+1];
-	struct sk_buff *rx_skbs[NET_RX_RING_SIZE+1];
+	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
 
 #define TX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
 	grant_ref_t gref_tx_head;
 	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1];
 	grant_ref_t gref_rx_head;
-	grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1];
+	grant_ref_t grant_rx_ref[NET_TX_RING_SIZE];
 
 	struct xenbus_device *xbdev;
 	int tx_ring_ref;
@@ -122,7 +122,7 @@ struct netfront_info {
 };
 
 /*
- * Access macros for acquiring freeing slots in {tx,rx}_skbs[].
+ * Access macros for acquiring freeing slots in tx_skbs[].
  */
 
 static inline void add_id_to_freelist(struct sk_buff **list, unsigned short id)
@@ -136,6 +136,29 @@ static inline unsigned short get_id_from
 	unsigned int id = (unsigned int)(unsigned long)list[0];
 	list[0] = list[id];
 	return id;
+}
+
+static inline int xennet_rxidx(RING_IDX idx)
+{
+	return idx & (NET_RX_RING_SIZE - 1);
+}
+
+static inline struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
+						RING_IDX ri)
+{
+	int i = xennet_rxidx(ri);
+	struct sk_buff *skb = np->rx_skbs[i];
+	np->rx_skbs[i] = NULL;
+	return skb;
+}
+
+static inline grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
+					    RING_IDX ri)
+{
+	int i = xennet_rxidx(ri);
+	grant_ref_t ref = np->grant_rx_ref[i];
+	np->grant_rx_ref[i] = GRANT_INVALID_REF;
+	return ref;
 }
 
 #define DPRINTK(fmt, args...)				\
@@ -598,8 +621,9 @@ static void network_alloc_rx_buffers(str
 
 		skb->dev = dev;
 
-		id = get_id_from_freelist(np->rx_skbs);
-
+		id = xennet_rxidx(req_prod + i);
+
+		BUG_ON(np->rx_skbs[id]);
 		np->rx_skbs[id] = skb;
 
 		RING_GET_REQUEST(&np->rx, req_prod + i)->id = id;
@@ -840,6 +864,19 @@ static irqreturn_t netif_int(int irq, vo
 	return IRQ_HANDLED;
 }
 
+static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
+				grant_ref_t ref)
+{
+	int new = xennet_rxidx(np->rx.req_prod_pvt);
+
+	BUG_ON(np->rx_skbs[new]);
+	np->rx_skbs[new] = skb;
+	np->grant_rx_ref[new] = ref;
+	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
+	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
+	np->rx.req_prod_pvt++;
+	RING_PUSH_REQUESTS(&np->rx);
+}
 
 static int netif_poll(struct net_device *dev, int *pbudget)
 {
@@ -874,12 +911,15 @@ static int netif_poll(struct net_device 
 	     i++, work_done++) {
 		rx = RING_GET_RESPONSE(&np->rx, i);
 
+		skb = xennet_get_rx_skb(np, i);
+		ref = xennet_get_rx_ref(np, i);
+
 		/*
 		 * This definitely indicates a bug, either in this driver or in
 		 * the backend driver. In future this should flag the bad
 		 * situation to the system controller to reboot the backed.
 		 */
-		if ((ref = np->grant_rx_ref[rx->id]) == GRANT_INVALID_REF) {
+		if (ref == GRANT_INVALID_REF) {
 			WPRINTK("Bad rx response id %d.\n", rx->id);
 			work_done--;
 			continue;
@@ -890,21 +930,12 @@ static int netif_poll(struct net_device 
 			if (net_ratelimit())
 				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
 					rx->id, rx->status);
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id -			
rx->id;
-			RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref -				ref;
-			np->rx.req_prod_pvt++;
-			RING_PUSH_REQUESTS(&np->rx);
+			xennet_move_rx_slot(np, skb, ref);
 			work_done--;
 			continue;
 		}
 
 		gnttab_release_grant_reference(&np->gref_rx_head, ref);
-		np->grant_rx_ref[rx->id] = GRANT_INVALID_REF;
-
-		skb = np->rx_skbs[rx->id];
-		add_id_to_freelist(np->rx_skbs, rx->id);
 
 		/* NB. We handle skb overflow later. */
 		skb->data = skb->head + rx->offset;
@@ -1158,15 +1189,23 @@ static void network_connect(struct net_d
 	}
 
 	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
-	for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
-		if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
+	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
+		if (!np->rx_skbs[i])
 			continue;
+
 		gnttab_grant_foreign_transfer_ref(
 			np->grant_rx_ref[i], np->xbdev->otherend_id,
 			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
 		RING_GET_REQUEST(&np->rx, requeue_idx)->gref  		
np->grant_rx_ref[i];
-		RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
+		RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx;
+
+		if (requeue_idx < i) {
+			np->rx_skbs[requeue_idx] = np->rx_skbs[i];
+			np->grant_rx_ref[requeue_idx] = np->grant_rx_ref[i];
+			np->rx_skbs[i] = NULL;
+			np->grant_rx_ref[i] = GRANT_INVALID_REF;
+		}
 		requeue_idx++;
 	}
 
@@ -1392,8 +1431,8 @@ static struct net_device * __devinit cre
 		np->grant_tx_ref[i] = GRANT_INVALID_REF;
 	}
 
-	for (i = 0; i <= NET_RX_RING_SIZE; i++) {
-		np->rx_skbs[i] = (void *)((unsigned long) i+1);
+	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+		np->rx_skbs[i] = NULL;
 		np->grant_rx_ref[i] = GRANT_INVALID_REF;
 	}
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:43 UTC

head link

[Xen-devel] [2/9] [NET] front: Move rx req pushing to one spot

Hi:

[NET] front: Move rx req pushing to one spot

This patch moves all rx request pushing to network_alloc_rx_buffers.
This is needed to reduce churn for TSO.  More importantly, this makes
it easier to send notifications when adding rx requests which is
required for having a queue in dom0.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 65ff65fb76ec -r 7bf12da8a6ed
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 17:51:16
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 17:51:32
2006 +1000
@@ -600,14 +600,17 @@ static void network_alloc_rx_buffers(str
 			/* Could not allocate any skbuffs. Try again later. */
 			mod_timer(&np->rx_refill_timer,
 				  jiffies + (HZ/10));
-			return;
+			break;
 		}
 		__skb_queue_tail(&np->rx_batch, skb);
 	}
 
 	/* Is the batch large enough to be worthwhile? */
-	if (i < (np->rx_target/2))
+	if (i < (np->rx_target/2)) {
+		if (req_prod > np->rx.sring->req_prod)
+			goto push;
 		return;
+	}
 
 	/* Adjust our fill target if we risked running out of buffers. */
 	if (((req_prod - np->rx.sring->rsp_prod) < (np->rx_target / 4))
&&
@@ -678,6 +681,7 @@ static void network_alloc_rx_buffers(str
 
 	/* Above is a suitable barrier to ensure backend will see requests. */
 	np->rx.req_prod_pvt = req_prod + i;
+push:
 	RING_PUSH_REQUESTS(&np->rx);
 }
 
@@ -875,7 +879,6 @@ static void xennet_move_rx_slot(struct n
 	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new;
 	RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref;
 	np->rx.req_prod_pvt++;
-	RING_PUSH_REQUESTS(&np->rx);
 }
 
 static int netif_poll(struct net_device *dev, int *pbudget)
@@ -1210,7 +1213,6 @@ static void network_connect(struct net_d
 	}
 
 	np->rx.req_prod_pvt = requeue_idx;
-	RING_PUSH_REQUESTS(&np->rx);
 
 	/*
 	 * Step 3: All public and private state should now be sane.  Get

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:44 UTC

head link

[Xen-devel] [3/9] [NET] back: Replace netif->status with netif_carrier_ok

Hi:

[NET] back: Replace netif->status with netif_carrier_ok

The connection status to the frontend can be represented using
netif_carrier_ok instead of netif->status.  As a result, we delay
the construction of the dev qdisc until the carrier comes on.  This
is a prerequisite for adding a tx queue.

By the same token, netif->active is now simply the conjunction of
netif_running and netif_carrier_ok so it too can be removed.

Because netif_carrier_off/netif_carrier_on and rtnl_lock all entail
memory barriers, there is no need to have extra memory barriers around
them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 7bf12da8a6ed -r c42c1f4bd148
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 28 17:51:32 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 28 18:11:09 2006
+1000
@@ -86,8 +86,6 @@ typedef struct netif_st {
 	struct timer_list credit_timeout;
 
 	/* Miscellaneous private stuff. */
-	enum { DISCONNECTED, DISCONNECTING, CONNECTED } status;
-	int active;
 	struct list_head list;  /* scheduling list */
 	atomic_t         refcnt;
 	struct net_device *dev;
diff -r 7bf12da8a6ed -r c42c1f4bd148
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 17:51:32
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:11:09
2006 +1000
@@ -36,28 +36,20 @@
 
 static void __netif_up(netif_t *netif)
 {
-	struct net_device *dev = netif->dev;
-	netif_tx_lock_bh(dev);
-	netif->active = 1;
-	netif_tx_unlock_bh(dev);
 	enable_irq(netif->irq);
 	netif_schedule_work(netif);
 }
 
 static void __netif_down(netif_t *netif)
 {
-	struct net_device *dev = netif->dev;
 	disable_irq(netif->irq);
-	netif_tx_lock_bh(dev);
-	netif->active = 0;
-	netif_tx_unlock_bh(dev);
 	netif_deschedule_work(netif);
 }
 
 static int net_open(struct net_device *dev)
 {
 	netif_t *netif = netdev_priv(dev);
-	if (netif->status == CONNECTED)
+	if (netif_carrier_ok(dev))
 		__netif_up(netif);
 	netif_start_queue(dev);
 	return 0;
@@ -67,7 +59,7 @@ static int net_close(struct net_device *
 {
 	netif_t *netif = netdev_priv(dev);
 	netif_stop_queue(dev);
-	if (netif->status == CONNECTED)
+	if (netif_carrier_ok(dev))
 		__netif_down(netif);
 	return 0;
 }
@@ -93,11 +85,12 @@ netif_t *netif_alloc(domid_t domid, unsi
 		return ERR_PTR(-ENOMEM);
 	}
 
+	netif_carrier_off(dev);
+
 	netif = netdev_priv(dev);
 	memset(netif, 0, sizeof(*netif));
 	netif->domid  = domid;
 	netif->handle = handle;
-	netif->status = DISCONNECTED;
 	atomic_set(&netif->refcnt, 1);
 	init_waitqueue_head(&netif->waiting_to_free);
 	netif->dev = dev;
@@ -256,11 +249,9 @@ int netif_map(netif_t *netif, unsigned l
 	netif->rx_req_cons_peek = 0;
 
 	netif_get(netif);
-	wmb(); /* Other CPUs see new state before interface is started. */
 
 	rtnl_lock();
-	netif->status = CONNECTED;
-	wmb();
+	netif_carrier_on(netif->dev);
 	if (netif_running(netif->dev))
 		__netif_up(netif);
 	rtnl_unlock();
@@ -296,20 +287,13 @@ static void netif_free(netif_t *netif)
 
 void netif_disconnect(netif_t *netif)
 {
-	switch (netif->status) {
-	case CONNECTED:
+	if (netif_carrier_ok(netif->dev)) {
 		rtnl_lock();
-		netif->status = DISCONNECTING;
-		wmb();
+		netif_carrier_off(netif->dev);
 		if (netif_running(netif->dev))
 			__netif_down(netif);
 		rtnl_unlock();
 		netif_put(netif);
-		/* fall through */
-	case DISCONNECTED:
-		netif_free(netif);
-		break;
-	default:
-		BUG();
-	}
-}
+	}
+	netif_free(netif);
+}
diff -r 7bf12da8a6ed -r c42c1f4bd148
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 17:51:32
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:11:09
2006 +1000
@@ -143,7 +143,7 @@ int netif_be_start_xmit(struct sk_buff *
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if the target domain has no receive buffers. */
-	if (!netif->active || 
+	if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)) ||
 	    (netif->rx_req_cons_peek == netif->rx.sring->req_prod) ||
 	    ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) = 	    
NET_RX_RING_SIZE))
@@ -404,7 +404,9 @@ static void add_to_net_schedule_list_tai
 		return;
 
 	spin_lock_irq(&net_schedule_list_lock);
-	if (!__on_net_schedule_list(netif) && netif->active) {
+	if (!__on_net_schedule_list(netif) &&
+	    likely(netif_running(netif->dev) &&
+		   netif_carrier_ok(netif->dev))) {
 		list_add_tail(&netif->list, &net_schedule_list);
 		netif_get(netif);
 	}

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:44 UTC

head link

[Xen-devel] [4/9] [NET] front: Added feature-rx-notify

Hi:

[NET] front: Added feature-rx-notify

This patch adds support to the frontend for notifying the backend whenever
the rx ring is refilled.  This is required in order for the backend to get
a tx queue.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r c42c1f4bd148 -r b41da4470aa6
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 18:11:09
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 18:11:21
2006 +1000
@@ -326,6 +326,12 @@ again:
 		goto abort_transaction;
 	}
 
+	err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify",
"%d", 1);
+	if (err) {
+		message = "writing feature-rx-notify";
+		goto abort_transaction;
+	}
+
 	err = xenbus_transaction_end(xbt, 0);
 	if (err) {
 		if (err == -EAGAIN)
@@ -683,6 +689,7 @@ static void network_alloc_rx_buffers(str
 	np->rx.req_prod_pvt = req_prod + i;
 push:
 	RING_PUSH_REQUESTS(&np->rx);
+	notify_remote_via_irq(np->irq);
 }
 
 static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
@@ -1221,7 +1228,6 @@ static void network_connect(struct net_d
 	 * packets.
 	 */
 	netif_carrier_on(dev);
-	notify_remote_via_irq(np->irq);
 	network_tx_buf_gc(dev);
 	network_alloc_rx_buffers(dev);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:44 UTC

head link

[Xen-devel] [5/9] [NET] back: Added tx queue

Hi:

[NET] back: Added tx queue

This patch adds a tx queue to the backend if the frontend supports rx
refill notification.  A queue is needed because SG/TSO greatly reduces
the number of packets that can be stored in the rx ring.  Given an rx
ring with 256 entries, a maximum TSO packet can occupy as many as 18
entries, meaning that the entire ring can only hold 14 packets.  This
is too small at high bandwidths with large TCP RX windows.

Having a tx queue does not present a new security risk as the queue is
a fixed size buffer just like the rx ring.  So each guest can only hold a
fixed amount of memory (proportional to the tx queue length) on the host.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r b41da4470aa6 -r 29673e3aab05
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 28 18:11:21 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 28 18:18:51 2006
+1000
@@ -76,6 +76,10 @@ typedef struct netif_st {
 	struct vm_struct *tx_comms_area;
 	struct vm_struct *rx_comms_area;
 
+	/* Set of features that can be turned on in dev->features. */
+	int features;
+	int can_queue;
+
 	/* Allow netif_be_start_xmit() to peek ahead in the rx request ring. */
 	RING_IDX rx_req_cons_peek;
 
@@ -119,4 +123,10 @@ struct net_device_stats *netif_be_get_st
 struct net_device_stats *netif_be_get_stats(struct net_device *dev);
 irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs);
 
+static inline int netbk_can_queue(struct net_device *dev)
+{
+	netif_t *netif = netdev_priv(dev);
+	return netif->can_queue;
+}
+
 #endif /* __NETIF__BACKEND__COMMON_H__ */
diff -r b41da4470aa6 -r 29673e3aab05
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:11:21
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:18:51
2006 +1000
@@ -51,14 +51,12 @@ static int net_open(struct net_device *d
 	netif_t *netif = netdev_priv(dev);
 	if (netif_carrier_ok(dev))
 		__netif_up(netif);
-	netif_start_queue(dev);
 	return 0;
 }
 
 static int net_close(struct net_device *dev)
 {
 	netif_t *netif = netdev_priv(dev);
-	netif_stop_queue(dev);
 	if (netif_carrier_ok(dev))
 		__netif_down(netif);
 	return 0;
@@ -107,8 +105,11 @@ netif_t *netif_alloc(domid_t domid, unsi
 
 	SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
 
-	/* Disable queuing. */
-	dev->tx_queue_len = 0;
+	/*
+	 * Reduce default TX queuelen so that each guest interface only
+	 * allows it to eat around 6.4MB of host memory.
+	 */
+	dev->tx_queue_len = 100;
 
 	for (i = 0; i < ETH_ALEN; i++)
 		if (be_mac[i] != 0)
diff -r b41da4470aa6 -r 29673e3aab05
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:11:21
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:18:51
2006 +1000
@@ -136,6 +136,14 @@ static inline int is_xen_skb(struct sk_b
 	return (cp == skbuff_cachep);
 }
 
+static inline int netbk_queue_full(netif_t *netif)
+{
+	RING_IDX peek = netif->rx_req_cons_peek;
+
+	return netif->rx.sring->req_prod - peek <= 0 ||
+	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0;
+}
+
 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	netif_t *netif = netdev_priv(dev);
@@ -143,11 +151,13 @@ int netif_be_start_xmit(struct sk_buff *
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if the target domain has no receive buffers. */
-	if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)) ||
-	    (netif->rx_req_cons_peek == netif->rx.sring->req_prod) ||
-	    ((netif->rx_req_cons_peek - netif->rx.rsp_prod_pvt) =-	    
NET_RX_RING_SIZE))
+	if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
 		goto drop;
+
+	if (unlikely(netbk_queue_full(netif))) {
+		BUG_ON(netbk_can_queue(dev));
+		goto drop;
+	}
 
 	/*
 	 * We do not copy the packet unless:
@@ -178,6 +188,9 @@ int netif_be_start_xmit(struct sk_buff *
 	netif->rx_req_cons_peek++;
 	netif_get(netif);
 
+	if (netbk_can_queue(dev) && netbk_queue_full(netif))
+		netif_stop_queue(dev);
+
 	skb_queue_tail(&rx_queue, skb);
 	tasklet_schedule(&net_rx_tasklet);
 
@@ -351,6 +364,10 @@ static void net_rx_action(unsigned long 
 			notify_list[notify_nr++] = irq;
 		}
 
+		if (netif_queue_stopped(netif->dev) &&
+		    !netbk_queue_full(netif))
+			netif_wake_queue(netif->dev);
+
 		netif_put(netif);
 		dev_kfree_skb(skb);
 		gop++;
@@ -974,8 +991,13 @@ irqreturn_t netif_be_int(int irq, void *
 irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs)
 {
 	netif_t *netif = dev_id;
+
 	add_to_net_schedule_list_tail(netif);
 	maybe_schedule_tx_action();
+
+	if (netif_queue_stopped(netif->dev) && !netbk_queue_full(netif))
+		netif_wake_queue(netif->dev);
+
 	return IRQ_HANDLED;
 }
 
diff -r b41da4470aa6 -r 29673e3aab05
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 28 18:11:21 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 28 18:18:51 2006
+1000
@@ -353,6 +353,7 @@ static int connect_rings(struct backend_
 	unsigned long tx_ring_ref, rx_ring_ref;
 	unsigned int evtchn;
 	int err;
+	int val;
 
 	DPRINTK("");
 
@@ -366,6 +367,15 @@ static int connect_rings(struct backend_
 				 dev->otherend);
 		return err;
 	}
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-notify",
"%d",
+			 &val) < 0)
+		val = 0;
+	if (val)
+		be->netif->can_queue = 1;
+	else
+		/* Must be non-zero for pfifo_fast to work. */
+		be->netif->dev->tx_queue_len = 1;
 
 	/* Map the shared frame, irq etc. */
 	err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:45 UTC

head link

[Xen-devel] [6/9] [NET] front: Add SG support

Hi:

[NET] front: Add SG support

This patch adds scatter-and-gather support to the frontend.  It also
advertises this fact through xenbus so that the backend can detect
this and send through SG requests only if it is supported.

SG support is required to support skb''s larger than one page.  This
in turn is needed for either jumbo MTU or TSO.  One of these is
required to bring local networking performance up to a level that
is acceptable.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 29673e3aab05 -r 05d6393e9d18
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 18:18:51
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 18:20:45
2006 +1000
@@ -46,11 +46,11 @@
 #include <linux/ethtool.h>
 #include <linux/in.h>
 #include <linux/if_ether.h>
+#include <linux/io.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 #include <net/arp.h>
 #include <net/route.h>
-#include <asm/io.h>
 #include <asm/uaccess.h>
 #include <xen/evtchn.h>
 #include <xen/xenbus.h>
@@ -62,17 +62,12 @@
 #include <xen/interface/grant_table.h>
 #include <xen/gnttab.h>
 
+#define RX_COPY_THRESHOLD 256
+
 #define GRANT_INVALID_REF	0
 
 #define NET_TX_RING_SIZE __RING_SIZE((struct netif_tx_sring *)0, PAGE_SIZE)
 #define NET_RX_RING_SIZE __RING_SIZE((struct netif_rx_sring *)0, PAGE_SIZE)
-
-static inline void init_skb_shinfo(struct sk_buff *skb)
-{
-	atomic_set(&(skb_shinfo(skb)->dataref), 1);
-	skb_shinfo(skb)->nr_frags = 0;
-	skb_shinfo(skb)->frag_list = NULL;
-}
 
 struct netfront_info {
 	struct list_head list;
@@ -329,6 +324,12 @@ again:
 	err = xenbus_printf(xbt, dev->nodename, "feature-rx-notify",
"%d", 1);
 	if (err) {
 		message = "writing feature-rx-notify";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-sg",
"%d", 1);
+	if (err) {
+		message = "writing feature-sg";
 		goto abort_transaction;
 	}
 
@@ -575,10 +576,13 @@ static void network_alloc_rx_buffers(str
 	unsigned short id;
 	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
+	struct page *page;
 	int i, batch_target;
 	RING_IDX req_prod = np->rx.req_prod_pvt;
 	struct xen_memory_reservation reservation;
 	grant_ref_t ref;
+	unsigned long pfn;
+	void *vaddr;
 
 	if (unlikely(!netif_carrier_ok(dev)))
 		return;
@@ -591,15 +595,16 @@ static void network_alloc_rx_buffers(str
 	 */
 	batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
 	for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
-		/*
-		 * Subtract dev_alloc_skb headroom (16 bytes) and shared info
-		 * tailroom then round down to SKB_DATA_ALIGN boundary.
-		 */
-		skb = __dev_alloc_skb(
-			((PAGE_SIZE - sizeof(struct skb_shared_info)) &
-			 (-SKB_DATA_ALIGN(1))) - 16,
-			GFP_ATOMIC|__GFP_NOWARN);
-		if (skb == NULL) {
+		/* Allocate an skb and a page. */
+		skb = __dev_alloc_skb(RX_COPY_THRESHOLD,
+				      GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(!skb))
+			goto no_skb;
+
+		page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+		if (!page) {
+			kfree_skb(skb);
+no_skb:
 			/* Any skbuffs queued for refill? Force them out. */
 			if (i != 0)
 				goto refill;
@@ -608,6 +613,9 @@ static void network_alloc_rx_buffers(str
 				  jiffies + (HZ/10));
 			break;
 		}
+
+		skb_shinfo(skb)->frags[0].page = page;
+		skb_shinfo(skb)->nr_frags = 1;
 		__skb_queue_tail(&np->rx_batch, skb);
 	}
 
@@ -639,18 +647,20 @@ static void network_alloc_rx_buffers(str
 		ref = gnttab_claim_grant_reference(&np->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
 		np->grant_rx_ref[id] = ref;
+
+		pfn = page_to_pfn(skb_shinfo(skb)->frags[0].page);
+		vaddr = page_address(skb_shinfo(skb)->frags[0].page);
+
 		gnttab_grant_foreign_transfer_ref(ref,
-						  np->xbdev->otherend_id,
-						  __pa(skb->head)>>PAGE_SHIFT);
+						  np->xbdev->otherend_id, pfn);
 		RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref;
-		np->rx_pfn_array[i] = virt_to_mfn(skb->head);
+		np->rx_pfn_array[i] = pfn_to_mfn(pfn);
 
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			/* Remove this page before passing back to Xen. */
-			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
-					    INVALID_P2M_ENTRY);
+			set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
 			MULTI_update_va_mapping(np->rx_mcl+i,
-						(unsigned long)skb->head,
+						(unsigned long)vaddr,
 						__pte(0), 0);
 		}
 	}
@@ -888,41 +898,29 @@ static void xennet_move_rx_slot(struct n
 	np->rx.req_prod_pvt++;
 }
 
-static int netif_poll(struct net_device *dev, int *pbudget)
-{
-	struct netfront_info *np = netdev_priv(dev);
-	struct sk_buff *skb, *nskb;
-	struct netif_rx_response *rx;
-	RING_IDX i, rp;
-	struct mmu_update *mmu = np->rx_mmu;
-	struct multicall_entry *mcl = np->rx_mcl;
-	int work_done, budget, more_to_do = 1;
-	struct sk_buff_head rxq;
-	unsigned long flags;
-	unsigned long mfn;
-	grant_ref_t ref;
-
-	spin_lock(&np->rx_lock);
-
-	if (unlikely(!netif_carrier_ok(dev))) {
-		spin_unlock(&np->rx_lock);
-		return 0;
-	}
-
-	skb_queue_head_init(&rxq);
-
-	if ((budget = *pbudget) > dev->quota)
-		budget = dev->quota;
-	rp = np->rx.sring->rsp_prod;
-	rmb(); /* Ensure we see queued responses up to ''rp''. */
-
-	for (i = np->rx.rsp_cons, work_done = 0;
-	     (i != rp) && (work_done < budget);
-	     i++, work_done++) {
-		rx = RING_GET_RESPONSE(&np->rx, i);
-
-		skb = xennet_get_rx_skb(np, i);
-		ref = xennet_get_rx_ref(np, i);
+static int xennet_get_responses(struct netfront_info *np,
+				struct netif_rx_response *rx, RING_IDX rp,
+				struct sk_buff_head *list, int count)
+{
+	struct mmu_update *mmu = np->rx_mmu + count;
+	struct multicall_entry *mcl = np->rx_mcl + count;
+	RING_IDX cons = np->rx.rsp_cons;
+	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
+	grant_ref_t ref = xennet_get_rx_ref(np, cons);
+	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
+	int frags = 1;
+	int err = 0;
+
+	for (;;) {
+		unsigned long mfn;
+
+		if (unlikely(rx->status < 0 ||
+			     rx->offset + rx->status > PAGE_SIZE)) {
+			if (net_ratelimit())
+				WPRINTK("rx->offset: %x, size: %u\n",
+					rx->offset, rx->status);
+			err = -EINVAL;
+		}
 
 		/*
 		 * This definitely indicates a bug, either in this driver or in
@@ -931,8 +929,8 @@ static int netif_poll(struct net_device 
 		 */
 		if (ref == GRANT_INVALID_REF) {
 			WPRINTK("Bad rx response id %d.\n", rx->id);
-			work_done--;
-			continue;
+			err = -EINVAL;
+			goto next;
 		}
 
 		/* Memory pressure, insufficient buffer headroom, ... */
@@ -941,16 +939,161 @@ static int netif_poll(struct net_device 
 				WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
 					rx->id, rx->status);
 			xennet_move_rx_slot(np, skb, ref);
+			err = -ENOMEM;
+			goto next;
+		}
+
+		gnttab_release_grant_reference(&np->gref_rx_head, ref);
+
+		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+			/* Remap the page. */
+			struct page *page = skb_shinfo(skb)->frags[0].page;
+			unsigned long pfn = page_to_pfn(page);
+			void *vaddr = page_address(page);
+
+			MULTI_update_va_mapping(mcl, (unsigned long)vaddr,
+						pfn_pte_ma(mfn, PAGE_KERNEL),
+						0);
+			mcl++;
+			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
+				| MMU_MACHPHYS_UPDATE;
+			mmu->val = pfn;
+			mmu++;
+
+			set_phys_to_machine(pfn, mfn);
+		}
+
+		__skb_queue_tail(list, skb);
+
+next:
+		if (!(rx->flags & NETRXF_more_data))
+			break;
+
+		if (cons + frags == rp) {
+			if (net_ratelimit())
+				WPRINTK("Need more frags\n");
+			err = -ENOENT;
+			break;
+		}
+
+		rx = RING_GET_RESPONSE(&np->rx, cons + frags);
+		skb = xennet_get_rx_skb(np, cons + frags);
+		ref = xennet_get_rx_ref(np, cons + frags);
+		frags++;
+	}
+
+	if (unlikely(frags > max)) {
+		if (net_ratelimit())
+			WPRINTK("Too many frags\n");
+		err = -E2BIG;
+	}
+
+	return err;
+}
+
+static RING_IDX xennet_fill_frags(struct netfront_info *np,
+				  struct sk_buff *skb,
+				  struct sk_buff_head *list)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	int nr_frags = shinfo->nr_frags;
+	RING_IDX cons = np->rx.rsp_cons;
+	skb_frag_t *frag = shinfo->frags + nr_frags;
+	struct sk_buff *nskb;
+
+	while ((nskb = __skb_dequeue(list))) {
+		struct netif_rx_response *rx +			RING_GET_RESPONSE(&np->rx, ++cons);
+
+		frag->page = skb_shinfo(nskb)->frags[0].page;
+		frag->page_offset = rx->offset;
+		frag->size = rx->status;
+
+		skb->data_len += rx->status;
+
+		skb_shinfo(nskb)->nr_frags = 0;
+		kfree_skb(nskb);
+
+		frag++;
+		nr_frags++;
+	}
+
+	shinfo->nr_frags = nr_frags;
+	return cons;
+}
+
+static int netif_poll(struct net_device *dev, int *pbudget)
+{
+	struct netfront_info *np = netdev_priv(dev);
+	struct sk_buff *skb;
+	struct netif_rx_response *rx;
+	RING_IDX i, rp;
+	struct multicall_entry *mcl;
+	int work_done, budget, more_to_do = 1;
+	struct sk_buff_head rxq;
+	struct sk_buff_head errq;
+	struct sk_buff_head tmpq;
+	unsigned long flags;
+	unsigned int len;
+	int pages_done;
+	int err;
+
+	spin_lock(&np->rx_lock);
+
+	if (unlikely(!netif_carrier_ok(dev))) {
+		spin_unlock(&np->rx_lock);
+		return 0;
+	}
+
+	skb_queue_head_init(&rxq);
+	skb_queue_head_init(&errq);
+	skb_queue_head_init(&tmpq);
+
+	if ((budget = *pbudget) > dev->quota)
+		budget = dev->quota;
+	rp = np->rx.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to ''rp''. */
+
+	for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0;
+	     (i != rp) && (work_done < budget);
+	     np->rx.rsp_cons = ++i, work_done++) {
+		rx = RING_GET_RESPONSE(&np->rx, i);
+
+		err = xennet_get_responses(np, rx, rp, &tmpq, pages_done);
+		pages_done += skb_queue_len(&tmpq);
+
+		if (unlikely(err)) {
+			i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1;
 			work_done--;
+			while ((skb = __skb_dequeue(&tmpq)))
+				__skb_queue_tail(&errq, skb);
+			np->stats.rx_errors++;
 			continue;
 		}
 
-		gnttab_release_grant_reference(&np->gref_rx_head, ref);
-
-		/* NB. We handle skb overflow later. */
-		skb->data = skb->head + rx->offset;
-		skb->len  = rx->status;
-		skb->tail = skb->data + skb->len;
+		skb = __skb_dequeue(&tmpq);
+
+		skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page;
+		skb->h.raw = skb->nh.raw + rx->offset;
+
+		len = rx->status;
+		if (len > RX_COPY_THRESHOLD)
+			len = RX_COPY_THRESHOLD;
+		skb_put(skb, len);
+
+		if (rx->status > len) {
+			skb_shinfo(skb)->frags[0].page_offset +				rx->offset + len;
+			skb_shinfo(skb)->frags[0].size = rx->status - len;
+			skb->data_len = rx->status - len;
+		} else {
+			skb_shinfo(skb)->frags[0].page = NULL;
+			skb_shinfo(skb)->nr_frags = 0;
+		}
+
+		i = xennet_fill_frags(np, skb, &tmpq);
+		skb->truesize += skb->data_len;
+		skb->len += skb->data_len;
 
 		/*
 		 * Old backends do not assert data_validated but we
@@ -966,96 +1109,38 @@ static int netif_poll(struct net_device 
 		skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank);
 
 		np->stats.rx_packets++;
-		np->stats.rx_bytes += rx->status;
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Remap the page. */
-			MULTI_update_va_mapping(mcl, (unsigned long)skb->head,
-						pfn_pte_ma(mfn, PAGE_KERNEL),
-						0);
-			mcl++;
-			mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
-				| MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(skb->head) >> PAGE_SHIFT;
-			mmu++;
-
-			set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
-					    mfn);
-		}
+		np->stats.rx_bytes += skb->len;
 
 		__skb_queue_tail(&rxq, skb);
 	}
 
 	/* Some pages are no longer absent... */
-	balloon_update_driver_allowance(-work_done);
+	balloon_update_driver_allowance(-pages_done);
 
 	/* Do all the remapping work, and M2P updates, in one big hypercall. */
-	if (likely((mcl - np->rx_mcl) != 0)) {
+	if (likely(pages_done)) {
+		mcl = np->rx_mcl + pages_done;
 		mcl->op = __HYPERVISOR_mmu_update;
 		mcl->args[0] = (unsigned long)np->rx_mmu;
-		mcl->args[1] = mmu - np->rx_mmu;
+		mcl->args[1] = pages_done;
 		mcl->args[2] = 0;
 		mcl->args[3] = DOMID_SELF;
-		mcl++;
-		(void)HYPERVISOR_multicall(np->rx_mcl, mcl - np->rx_mcl);
-	}
+		(void)HYPERVISOR_multicall(np->rx_mcl, pages_done + 1);
+	}
+
+	while ((skb = __skb_dequeue(&errq)))
+		kfree_skb(skb);
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-		if (skb->len > (dev->mtu + ETH_HLEN + 4)) {
-			if (net_ratelimit())
-				printk(KERN_INFO "Received packet too big for "
-				       "MTU (%d > %d)\n",
-				       skb->len - ETH_HLEN - 4, dev->mtu);
-			skb->len  = 0;
-			skb->tail = skb->data;
-			init_skb_shinfo(skb);
-			dev_kfree_skb(skb);
-			continue;
-		}
-
-		/*
-		 * Enough room in skbuff for the data we were passed? Also,
-		 * Linux expects at least 16 bytes headroom in each rx buffer.
-		 */
-		if (unlikely(skb->tail > skb->end) ||
-		    unlikely((skb->data - skb->head) < 16)) {
-			if (net_ratelimit()) {
-				if (skb->tail > skb->end)
-					printk(KERN_INFO "Received packet "
-					       "is %zd bytes beyond tail.\n",
-					       skb->tail - skb->end);
-				else
-					printk(KERN_INFO "Received packet "
-					       "is %zd bytes before head.\n",
-					       16 - (skb->data - skb->head));
-			}
-
-			nskb = __dev_alloc_skb(skb->len + 2,
-					       GFP_ATOMIC|__GFP_NOWARN);
-			if (nskb != NULL) {
-				skb_reserve(nskb, 2);
-				skb_put(nskb, skb->len);
-				memcpy(nskb->data, skb->data, skb->len);
-				/* Copy any other fields we already set up. */
-				nskb->dev = skb->dev;
-				nskb->ip_summed = skb->ip_summed;
-				nskb->proto_data_valid = skb->proto_data_valid;
-				nskb->proto_csum_blank = skb->proto_csum_blank;
-			}
-
-			/* Reinitialise and then destroy the old skbuff. */
-			skb->len  = 0;
-			skb->tail = skb->data;
-			init_skb_shinfo(skb);
-			dev_kfree_skb(skb);
-
-			/* Switch old for new, if we copied the buffer. */
-			if ((skb = nskb) == NULL)
-				continue;
-		}
-
-		/* Set the shinfo area, which is hidden behind the data. */
-		init_skb_shinfo(skb);
+		struct page *page = (struct page *)skb->nh.raw;
+		void *vaddr = page_address(page);
+
+		memcpy(skb->data, vaddr + (skb->h.raw - skb->nh.raw),
+		       skb_headlen(skb));
+
+		if (page != skb_shinfo(skb)->frags[0].page)
+			__free_page(page);
+
 		/* Ethernet work: Delayed to here as it peeks the header. */
 		skb->protocol = eth_type_trans(skb, dev);
 
@@ -1063,8 +1148,6 @@ static int netif_poll(struct net_device 
 		netif_receive_skb(skb);
 		dev->last_rx = jiffies;
 	}
-
-	np->rx.rsp_cons = i;
 
 	/* If we get a callback with very few responses, reduce fill target. */
 	/* NB. Note exponential increase, linear decrease. */
@@ -1205,7 +1288,7 @@ static void network_connect(struct net_d
 
 		gnttab_grant_foreign_transfer_ref(
 			np->grant_rx_ref[i], np->xbdev->otherend_id,
-			__pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
+			page_to_pfn(skb_shinfo(np->rx_skbs[i])->frags->page));
 		RING_GET_REQUEST(&np->rx, requeue_idx)->gref  		
np->grant_rx_ref[i];
 		RING_GET_REQUEST(&np->rx, requeue_idx)->id = requeue_idx;
diff -r 29673e3aab05 -r 05d6393e9d18 xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Fri Jul 28 18:18:51 2006 +1000
+++ b/xen/include/public/io/netif.h	Fri Jul 28 18:20:45 2006 +1000
@@ -123,6 +123,10 @@ typedef struct netif_rx_request netif_rx
 #define _NETRXF_csum_blank     (1)
 #define  NETRXF_csum_blank     (1U<<_NETRXF_csum_blank)
 
+/* Packet continues in the next request descriptor. */
+#define _NETRXF_more_data      (2)
+#define  NETRXF_more_data      (1U<<_NETRXF_more_data)
+
 struct netif_rx_response {
     uint16_t id;
     uint16_t offset;       /* Offset in page of start of received packet  */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:45 UTC

head link

[Xen-devel] [7/9] [NET] back: Transmit SG packets if supported

Hi:

[NET] back: Transmit SG packets if supported

This patch adds scatter-and-gather transmission support to the backend.
This allows the MTU to be raised right now and the potential for TSO in
future.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 05d6393e9d18 -r 4f008474675a
linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 28 18:20:45 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h	Fri Jul 28 18:22:22 2006
+1000
@@ -129,4 +129,10 @@ static inline int netbk_can_queue(struct
 	return netif->can_queue;
 }
 
+static inline int netbk_can_sg(struct net_device *dev)
+{
+	netif_t *netif = netdev_priv(dev);
+	return netif->features & NETIF_F_SG;
+}
+
 #endif /* __NETIF__BACKEND__COMMON_H__ */
diff -r 05d6393e9d18 -r 4f008474675a
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:20:45
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:22:22
2006 +1000
@@ -62,10 +62,34 @@ static int net_close(struct net_device *
 	return 0;
 }
 
+static int netbk_change_mtu(struct net_device *dev, int mtu)
+{
+	int max = netbk_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
+
+	if (mtu > max)
+		return -EINVAL;
+	dev->mtu = mtu;
+	return 0;
+}
+
+static int netbk_set_sg(struct net_device *dev, u32 data)
+{
+	if (data) {
+		netif_t *netif = netdev_priv(dev);
+
+		if (!(netif->features & NETIF_F_SG))
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_sg(dev, data);
+}
+
 static struct ethtool_ops network_ethtool_ops  {
 	.get_tx_csum = ethtool_op_get_tx_csum,
 	.set_tx_csum = ethtool_op_set_tx_csum,
+	.get_sg = ethtool_op_get_sg,
+	.set_sg = netbk_set_sg,
 	.get_link = ethtool_op_get_link,
 };
 
@@ -101,6 +125,7 @@ netif_t *netif_alloc(domid_t domid, unsi
 	dev->get_stats       = netif_be_get_stats;
 	dev->open            = net_open;
 	dev->stop            = net_close;
+	dev->change_mtu	     = netbk_change_mtu;
 	dev->features        = NETIF_F_IP_CSUM;
 
 	SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
diff -r 05d6393e9d18 -r 4f008474675a
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:20:45
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:22:22
2006 +1000
@@ -40,6 +40,11 @@
 
 /*#define NETBE_DEBUG_INTERRUPT*/
 
+struct netbk_rx_meta {
+	skb_frag_t frag;
+	int id;
+};
+
 static void netif_idx_release(u16 pending_idx);
 static void netif_page_release(struct page *page);
 static void make_tx_response(netif_t *netif, 
@@ -100,21 +105,27 @@ static unsigned long mfn_list[MAX_MFN_AL
 static unsigned long mfn_list[MAX_MFN_ALLOC];
 static unsigned int alloc_index = 0;
 
-static unsigned long alloc_mfn(void)
-{
-	unsigned long mfn = 0;
+static inline unsigned long alloc_mfn(void)
+{
+	return mfn_list[--alloc_index];
+}
+
+static int check_mfn(int nr)
+{
 	struct xen_memory_reservation reservation = {
-		.nr_extents   = MAX_MFN_ALLOC,
 		.extent_order = 0,
 		.domid        = DOMID_SELF
 	};
-	set_xen_guest_handle(reservation.extent_start, mfn_list);
-	if ( unlikely(alloc_index == 0) )
-		alloc_index = HYPERVISOR_memory_op(
-			XENMEM_increase_reservation, &reservation);
-	if ( alloc_index != 0 )
-		mfn = mfn_list[--alloc_index];
-	return mfn;
+
+	if (likely(alloc_index >= nr))
+		return 0;
+
+	set_xen_guest_handle(reservation.extent_start, mfn_list + alloc_index);
+	reservation.nr_extents = MAX_MFN_ALLOC - alloc_index;
+	alloc_index += HYPERVISOR_memory_op(XENMEM_increase_reservation,
+					    &reservation);
+
+	return alloc_index >= nr ? 0 : -ENOMEM;
 }
 
 static inline void maybe_schedule_tx_action(void)
@@ -136,12 +147,87 @@ static inline int is_xen_skb(struct sk_b
 	return (cp == skbuff_cachep);
 }
 
+static struct sk_buff *netbk_copy_skb(struct sk_buff *skb)
+{
+	struct skb_shared_info *ninfo;
+	struct sk_buff *nskb;
+	unsigned long offset;
+	int ret;
+	int len;
+	int headlen;
+
+	nskb = alloc_skb(SKB_MAX_HEAD(0), GFP_ATOMIC);
+	if (unlikely(!nskb))
+		goto err;
+
+	skb_reserve(nskb, 16);
+	headlen = nskb->end - nskb->data;
+	if (headlen > skb_headlen(skb))
+		headlen = skb_headlen(skb);
+	ret = skb_copy_bits(skb, 0, __skb_put(nskb, headlen), headlen);
+	BUG_ON(ret);
+
+	ninfo = skb_shinfo(nskb);
+	ninfo->gso_size = skb_shinfo(skb)->gso_size;
+	ninfo->gso_type = skb_shinfo(skb)->gso_type;
+
+	offset = headlen;
+	len = skb->len - headlen;
+
+	nskb->len = skb->len;
+	nskb->data_len = len;
+	nskb->truesize += len;
+
+	while (len) {
+		struct page *page;
+		int copy;
+		int zero;
+
+		if (unlikely(ninfo->nr_frags >= MAX_SKB_FRAGS)) {
+			dump_stack();
+			goto err_free;
+		}
+
+		copy = len >= PAGE_SIZE ? PAGE_SIZE : len;
+		zero = len >= PAGE_SIZE ? 0 : __GFP_ZERO;
+
+		page = alloc_page(GFP_ATOMIC | zero);
+		if (unlikely(!page))
+			goto err_free;
+
+		ret = skb_copy_bits(skb, offset, page_address(page), copy);
+		BUG_ON(ret);
+
+		ninfo->frags[ninfo->nr_frags].page = page;
+		ninfo->frags[ninfo->nr_frags].page_offset = 0;
+		ninfo->frags[ninfo->nr_frags].size = copy;
+		ninfo->nr_frags++;
+
+		offset += copy;
+		len -= copy;
+	}
+
+	offset = nskb->data - skb->data;
+
+	nskb->h.raw = skb->h.raw + offset;
+	nskb->nh.raw = skb->nh.raw + offset;
+	nskb->mac.raw = skb->mac.raw + offset;
+
+	return nskb;
+
+err_free:
+	kfree_skb(nskb);
+err:
+	return NULL;
+}
+
 static inline int netbk_queue_full(netif_t *netif)
 {
 	RING_IDX peek = netif->rx_req_cons_peek;
 
-	return netif->rx.sring->req_prod - peek <= 0 ||
-	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <= 0;
+	return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS ||
+	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <+	      
MAX_SKB_FRAGS;
 }
 
 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -163,20 +249,12 @@ int netif_be_start_xmit(struct sk_buff *
 	 * We do not copy the packet unless:
 	 *  1. The data is shared; or
 	 *  2. The data is not allocated from our special cache.
-	 * NB. We also couldn''t cope with fragmented packets, but we
won''t get
-	 *     any because we not advertise the NETIF_F_SG feature.
+	 *  3. The data is fragmented.
 	 */
-	if (skb_shared(skb) || skb_cloned(skb) || !is_xen_skb(skb)) {
-		int hlen = skb->data - skb->head;
-		int ret;
-		struct sk_buff *nskb = dev_alloc_skb(hlen + skb->len);
+	if (skb_cloned(skb) || skb_is_nonlinear(skb) || !is_xen_skb(skb)) {
+		struct sk_buff *nskb = netbk_copy_skb(skb);
 		if ( unlikely(nskb == NULL) )
 			goto drop;
-		skb_reserve(nskb, hlen);
-		__skb_put(nskb, skb->len);
-		ret = skb_copy_bits(skb, -hlen, nskb->data - hlen,
-				     skb->len + hlen);
-		BUG_ON(ret);
 		/* Copy only the header fields we use in this driver. */
 		nskb->dev = skb->dev;
 		nskb->ip_summed = skb->ip_summed;
@@ -185,7 +263,7 @@ int netif_be_start_xmit(struct sk_buff *
 		skb = nskb;
 	}
 
-	netif->rx_req_cons_peek++;
+	netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1;
 	netif_get(netif);
 
 	if (netbk_can_queue(dev) && netbk_queue_full(netif))
@@ -221,116 +299,80 @@ int xen_network_done(void)
 }
 #endif
 
-static void net_rx_action(unsigned long unused)
-{
-	netif_t *netif = NULL; 
-	s8 status;
-	u16 size, id, irq, flags;
-	multicall_entry_t *mcl;
-	mmu_update_t *mmu;
-	gnttab_transfer_t *gop;
-	unsigned long vdata, old_mfn, new_mfn;
-	struct sk_buff_head rxq;
-	struct sk_buff *skb;
-	int notify_nr = 0;
-	int ret;
+static u16 netbk_gop_frag(netif_t *netif, struct page *page, int count, int i)
+{
+	multicall_entry_t *mcl = rx_mcl + count;
+	mmu_update_t *mmu = rx_mmu + count;
+	gnttab_transfer_t *gop = grant_rx_op + count;
+	netif_rx_request_t *req;
+	unsigned long old_mfn, new_mfn;
+
+	old_mfn = virt_to_mfn(page_address(page));
+
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		new_mfn = alloc_mfn();
+
+		/*
+		 * Set the new P2M table entry before reassigning
+		 * the old data page. Heed the comment in
+		 * pgtable-2level.h:pte_page(). :-)
+		 */
+		set_phys_to_machine(page_to_pfn(page), new_mfn);
+
+		MULTI_update_va_mapping(mcl, (unsigned long)page_address(page),
+					pfn_pte_ma(new_mfn, PAGE_KERNEL), 0);
+
+		mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
+			MMU_MACHPHYS_UPDATE;
+		mmu->val = page_to_pfn(page);
+	}
+
+	req = RING_GET_REQUEST(&netif->rx, netif->rx.req_cons + i);
+	gop->mfn = old_mfn;
+	gop->domid = netif->domid;
+	gop->ref = req->gref;
+	return req->id;
+}
+
+static void netbk_gop_skb(struct sk_buff *skb, struct netbk_rx_meta *meta,
+			  int count)
+{
+	netif_t *netif = netdev_priv(skb->dev);
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	int i;
+
+	for (i = 0; i < nr_frags; i++) {
+		meta[++count].frag = skb_shinfo(skb)->frags[i];
+		meta[count].id = netbk_gop_frag(netif, meta[count].frag.page,
+						count, i + 1);
+	}
+
 	/*
-	 * Putting hundreds of bytes on the stack is considered rude.
-	 * Static works because a tasklet can only be on one CPU at any time.
+	 * This must occur at the end to ensure that we don''t trash
+	 * skb_shinfo until we''re done.
 	 */
-	static u16 notify_list[NET_RX_RING_SIZE];
-
-	skb_queue_head_init(&rxq);
-
-	mcl = rx_mcl;
-	mmu = rx_mmu;
-	gop = grant_rx_op;
-
-	while ((skb = skb_dequeue(&rx_queue)) != NULL) {
-		netif   = netdev_priv(skb->dev);
-		vdata   = (unsigned long)skb->data;
-		old_mfn = virt_to_mfn(vdata);
-
-		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-			/* Memory squeeze? Back off for an arbitrary while. */
-			if ((new_mfn = alloc_mfn()) == 0) {
-				if ( net_ratelimit() )
-					WPRINTK("Memory squeeze in netback "
-						"driver.\n");
-				mod_timer(&net_timer, jiffies + HZ);
-				skb_queue_head(&rx_queue, skb);
-				break;
-			}
-			/*
-			 * Set the new P2M table entry before reassigning
-			 * the old data page. Heed the comment in
-			 * pgtable-2level.h:pte_page(). :-)
-			 */
-			set_phys_to_machine(
-				__pa(skb->data) >> PAGE_SHIFT,
-				new_mfn);
-
-			MULTI_update_va_mapping(mcl, vdata,
-						pfn_pte_ma(new_mfn,
-							   PAGE_KERNEL), 0);
-			mcl++;
-
-			mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
-				MMU_MACHPHYS_UPDATE;
-			mmu->val = __pa(vdata) >> PAGE_SHIFT;
-			mmu++;
-		}
-
-		gop->mfn = old_mfn;
-		gop->domid = netif->domid;
-		gop->ref = RING_GET_REQUEST(
-			&netif->rx, netif->rx.req_cons)->gref;
-		netif->rx.req_cons++;
-		gop++;
-
-		__skb_queue_tail(&rxq, skb);
-
-		/* Filled the batch queue? */
-		if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op))
-			break;
-	}
-
-	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-		if (mcl == rx_mcl)
-			return;
-
-		mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
-
-		if (mmu - rx_mmu) {
-			mcl->op = __HYPERVISOR_mmu_update;
-			mcl->args[0] = (unsigned long)rx_mmu;
-			mcl->args[1] = mmu - rx_mmu;
-			mcl->args[2] = 0;
-			mcl->args[3] = DOMID_SELF;
-			mcl++;
-		}
-
-		ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
-		BUG_ON(ret != 0);
-	}
-
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, 
-					gop - grant_rx_op);
-	BUG_ON(ret != 0);
-
-	mcl = rx_mcl;
-	gop = grant_rx_op;
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-		netif   = netdev_priv(skb->dev);
-		size    = skb->tail - skb->data;
-
-		atomic_set(&(skb_shinfo(skb)->dataref), 1);
-		skb_shinfo(skb)->nr_frags = 0;
-		skb_shinfo(skb)->frag_list = NULL;
-
-		netif->stats.tx_bytes += size;
-		netif->stats.tx_packets++;
-
+	meta[count - nr_frags].id = netbk_gop_frag(netif,
+						   virt_to_page(skb->data),
+						   count - nr_frags, 0);
+	netif->rx.req_cons += nr_frags + 1;
+}
+
+static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta)
+{
+	int i;
+
+	for (i = 0; i < nr_frags; i++)
+		put_page(meta[i].frag.page);
+}
+
+static int netbk_check_gop(int nr_frags, domid_t domid, int count)
+{
+	multicall_entry_t *mcl = rx_mcl + count;
+	gnttab_transfer_t *gop = grant_rx_op + count;
+	int status = NETIF_RSP_OKAY;
+	int i;
+
+	for (i = 0; i <= nr_frags; i++) {
 		if (!xen_feature(XENFEAT_auto_translated_physmap)) {
 			/* The update_va_mapping() must not fail. */
 			BUG_ON(mcl->result != 0);
@@ -338,10 +380,9 @@ static void net_rx_action(unsigned long 
 		}
 
 		/* Check the reassignment error code. */
-		status = NETIF_RSP_OKAY;
 		if (gop->status != 0) { 
 			DPRINTK("Bad status %d from grant transfer to DOM%u\n",
-				gop->status, netif->domid);
+				gop->status, domid);
 			/*
 			 * Page no longer belongs to us unless GNTST_bad_page,
 			 * but that should be a fatal error anyway.
@@ -349,17 +390,128 @@ static void net_rx_action(unsigned long 
 			BUG_ON(gop->status == GNTST_bad_page);
 			status = NETIF_RSP_ERROR; 
 		}
-		irq = netif->irq;
-		id = RING_GET_REQUEST(&netif->rx, netif->rx.rsp_prod_pvt)->id;
-		flags = 0;
+		gop++;
+	}
+
+	return status;
+}
+
+static void netbk_add_frag_responses(netif_t *netif, int status,
+				     struct netbk_rx_meta *meta, int nr_frags)
+{
+	int i;
+
+	for (i = 0; i < nr_frags; i++) {
+		int id = meta[i].id;
+		int flags = (i == nr_frags - 1) ? 0 : NETRXF_more_data;
+
+		make_rx_response(netif, id, status, meta[i].frag.page_offset,
+				 meta[i].frag.size, flags);
+	}
+}
+
+static void net_rx_action(unsigned long unused)
+{
+	netif_t *netif = NULL; 
+	s8 status;
+	u16 id, irq, flags;
+	multicall_entry_t *mcl;
+	struct sk_buff_head rxq;
+	struct sk_buff *skb;
+	int notify_nr = 0;
+	int ret;
+	int nr_frags;
+	int count;
+
+	/*
+	 * Putting hundreds of bytes on the stack is considered rude.
+	 * Static works because a tasklet can only be on one CPU at any time.
+	 */
+	static u16 notify_list[NET_RX_RING_SIZE];
+	static struct netbk_rx_meta meta[NET_RX_RING_SIZE];
+
+	skb_queue_head_init(&rxq);
+
+	count = 0;
+
+	while ((skb = skb_dequeue(&rx_queue)) != NULL) {
+		nr_frags = skb_shinfo(skb)->nr_frags;
+		*(int *)skb->cb = nr_frags;
+
+		if (!xen_feature(XENFEAT_auto_translated_physmap) &&
+		    check_mfn(nr_frags + 1)) {
+			/* Memory squeeze? Back off for an arbitrary while. */
+			if ( net_ratelimit() )
+				WPRINTK("Memory squeeze in netback "
+					"driver.\n");
+			mod_timer(&net_timer, jiffies + HZ);
+			skb_queue_head(&rx_queue, skb);
+			break;
+		}
+
+		netbk_gop_skb(skb, meta, count);
+
+		count += nr_frags + 1;
+
+		__skb_queue_tail(&rxq, skb);
+
+		/* Filled the batch queue? */
+		if (count + MAX_SKB_FRAGS >= NET_RX_RING_SIZE)
+			break;
+	}
+
+	if (!count)
+		return;
+
+	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+		mcl = rx_mcl + count;
+
+		mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
+
+		mcl->op = __HYPERVISOR_mmu_update;
+		mcl->args[0] = (unsigned long)rx_mmu;
+		mcl->args[1] = count;
+		mcl->args[2] = 0;
+		mcl->args[3] = DOMID_SELF;
+
+		ret = HYPERVISOR_multicall(rx_mcl, count + 1);
+		BUG_ON(ret != 0);
+	}
+
+	ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op, count);
+	BUG_ON(ret != 0);
+
+	count = 0;
+	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+		nr_frags = *(int *)skb->cb;
+
+		atomic_set(&(skb_shinfo(skb)->dataref), 1);
+		skb_shinfo(skb)->nr_frags = 0;
+		skb_shinfo(skb)->frag_list = NULL;
+
+		netif = netdev_priv(skb->dev);
+		netif->stats.tx_bytes += skb->len;
+		netif->stats.tx_packets++;
+
+		netbk_free_pages(nr_frags, meta + count + 1);
+		status = netbk_check_gop(nr_frags, netif->domid, count);
+
+		id = meta[count].id;
+		flags = nr_frags ? NETRXF_more_data : 0;
+
 		if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
 			flags |= NETRXF_csum_blank | NETRXF_data_validated;
 		else if (skb->proto_data_valid) /* remote but checksummed? */
 			flags |= NETRXF_data_validated;
-		if (make_rx_response(netif, id, status,
-				     (unsigned long)skb->data & ~PAGE_MASK,
-				     size, flags) &&
-		    (rx_notify[irq] == 0)) {
+
+		make_rx_response(netif, id, status, offset_in_page(skb->data),
+				 skb_headlen(skb), flags);
+		netbk_add_frag_responses(netif, status, meta + count + 1,
+					 nr_frags);
+
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, ret);
+		irq = netif->irq;
+		if (ret && !rx_notify[irq]) {
 			rx_notify[irq] = 1;
 			notify_list[notify_nr++] = irq;
 		}
@@ -370,7 +522,7 @@ static void net_rx_action(unsigned long 
 
 		netif_put(netif);
 		dev_kfree_skb(skb);
-		gop++;
+		count += nr_frags + 1;
 	}
 
 	while (notify_nr != 0) {
@@ -1040,7 +1192,6 @@ static int make_rx_response(netif_t *net
 {
 	RING_IDX i = netif->rx.rsp_prod_pvt;
 	netif_rx_response_t *resp;
-	int notify;
 
 	resp = RING_GET_RESPONSE(&netif->rx, i);
 	resp->offset     = offset;
@@ -1051,9 +1202,8 @@ static int make_rx_response(netif_t *net
 		resp->status = (s16)st;
 
 	netif->rx.rsp_prod_pvt = ++i;
-	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, notify);
-
-	return notify;
+
+	return 0;
 }
 
 #ifdef NETBE_DEBUG_INTERRUPT
diff -r 05d6393e9d18 -r 4f008474675a
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 28 18:20:45 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 28 18:22:22 2006
+1000
@@ -377,6 +377,13 @@ static int connect_rings(struct backend_
 		/* Must be non-zero for pfifo_fast to work. */
 		be->netif->dev->tx_queue_len = 1;
 
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg",
"%d", &val) < 0)
+		val = 0;
+	if (val) {
+		be->netif->features |= NETIF_F_SG;
+		be->netif->dev->features |= NETIF_F_SG;
+	}
+
 	/* Map the shared frame, irq etc. */
 	err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn);
 	if (err) {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:46 UTC

head link

[Xen-devel] [8/9] [NET] front: Add TSO support

Hi:

[NET] front: Add TSO support

This patch adds TCP Segmentation Offload (TSO) support to the frontend.
It also advertises this fact through xenbus so that the frontend can
detect this and send through TSO requests only if it is supported.

This is done using an extra request slot which is indicated by a flag
in the first slot.  In future checksum offload can be done in the same
way.

Even though only TSO is supported for now the code actually supports
GSO so it can be applied to any other protocol.  The only missing bit
is the detection of host support for a specific GSO protocol.  Once that
is added we can advertise all supported protocols to the guest.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 4f008474675a -r 3df7a115ddde
linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 18:22:22
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Jul 28 18:22:38
2006 +1000
@@ -116,6 +116,11 @@ struct netfront_info {
 	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
 };
 
+struct netfront_rx_info {
+	struct netif_rx_response rx;
+	struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1];
+};
+
 /*
  * Access macros for acquiring freeing slots in tx_skbs[].
  */
@@ -330,6 +335,12 @@ again:
 	err = xenbus_printf(xbt, dev->nodename, "feature-sg",
"%d", 1);
 	if (err) {
 		message = "writing feature-sg";
+		goto abort_transaction;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv4",
"%d", 1);
+	if (err) {
+		message = "writing feature-gso-tcpv4";
 		goto abort_transaction;
 	}
 
@@ -898,18 +909,65 @@ static void xennet_move_rx_slot(struct n
 	np->rx.req_prod_pvt++;
 }
 
+int xennet_get_extras(struct netfront_info *np,
+		      struct netif_extra_info *extras, RING_IDX rp)
+
+{
+	struct netif_extra_info *extra;
+	RING_IDX cons = np->rx.rsp_cons;
+	int err = 0;
+
+	do {
+		struct sk_buff *skb;
+		grant_ref_t ref;
+
+		if (unlikely(cons + 1 == rp)) {
+			if (net_ratelimit())
+				WPRINTK("Missing extra info\n");
+			err = -EBADR;
+			break;
+		}
+
+		extra = (struct netif_extra_info *)
+			RING_GET_RESPONSE(&np->rx, ++cons);
+
+		if (unlikely(!extra->type ||
+			     extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
+			if (net_ratelimit())
+				WPRINTK("Invalid extra type: %d\n",
+					extra->type);
+			err = -EINVAL;
+		} else
+			memcpy(&extras[extra->type - 1], extra, sizeof(*extra));
+
+		skb = xennet_get_rx_skb(np, cons);
+		ref = xennet_get_rx_ref(np, cons);
+		xennet_move_rx_slot(np, skb, ref);
+	} while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
+
+	np->rx.rsp_cons = cons;
+	return err;
+}
+
 static int xennet_get_responses(struct netfront_info *np,
-				struct netif_rx_response *rx, RING_IDX rp,
+				struct netfront_rx_info *rinfo, RING_IDX rp,
 				struct sk_buff_head *list, int count)
 {
 	struct mmu_update *mmu = np->rx_mmu + count;
 	struct multicall_entry *mcl = np->rx_mcl + count;
+	struct netif_rx_response *rx = &rinfo->rx;
+	struct netif_extra_info *extras = rinfo->extras;
 	RING_IDX cons = np->rx.rsp_cons;
 	struct sk_buff *skb = xennet_get_rx_skb(np, cons);
 	grant_ref_t ref = xennet_get_rx_ref(np, cons);
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int frags = 1;
 	int err = 0;
+
+	if (rx->flags & NETRXF_extra_info) {
+		err = xennet_get_extras(np, extras, rp);
+		cons = np->rx.rsp_cons;
+	}
 
 	for (;;) {
 		unsigned long mfn;
@@ -1022,11 +1080,38 @@ static RING_IDX xennet_fill_frags(struct
 	return cons;
 }
 
+static int xennet_set_skb_gso(struct sk_buff *skb, struct netif_extra_info
*gso)
+{
+	if (!gso->u.gso.size) {
+		if (net_ratelimit())
+			WPRINTK("GSO size must not be zero.\n");
+		return -EINVAL;
+	}
+
+	/* Currently only TCPv4 S.O. is supported. */
+	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+		if (net_ratelimit())
+			WPRINTK("Bad GSO type %d.\n", gso->u.gso.type);
+		return -EINVAL;
+	}
+
+	skb_shinfo(skb)->gso_size = gso->u.gso.size;
+	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+
+	/* Header must be checked, and gso_segs computed. */
+	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
+	skb_shinfo(skb)->gso_segs = 0;
+
+	return 0;
+}
+
 static int netif_poll(struct net_device *dev, int *pbudget)
 {
 	struct netfront_info *np = netdev_priv(dev);
 	struct sk_buff *skb;
-	struct netif_rx_response *rx;
+	struct netfront_rx_info rinfo;
+	struct netif_rx_response *rx = &rinfo.rx;
+	struct netif_extra_info *extras = rinfo.extras;
 	RING_IDX i, rp;
 	struct multicall_entry *mcl;
 	int work_done, budget, more_to_do = 1;
@@ -1057,12 +1142,14 @@ static int netif_poll(struct net_device 
 	for (i = np->rx.rsp_cons, work_done = 0, pages_done = 0;
 	     (i != rp) && (work_done < budget);
 	     np->rx.rsp_cons = ++i, work_done++) {
-		rx = RING_GET_RESPONSE(&np->rx, i);
-
-		err = xennet_get_responses(np, rx, rp, &tmpq, pages_done);
+		memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx));
+		memset(extras, 0, sizeof(extras));
+
+		err = xennet_get_responses(np, &rinfo, rp, &tmpq, pages_done);
 		pages_done += skb_queue_len(&tmpq);
 
 		if (unlikely(err)) {
+err:
 			i = np->rx.rsp_cons + skb_queue_len(&tmpq) - 1;
 			work_done--;
 			while ((skb = __skb_dequeue(&tmpq)))
@@ -1072,6 +1159,16 @@ static int netif_poll(struct net_device 
 		}
 
 		skb = __skb_dequeue(&tmpq);
+
+		if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) {
+			struct netif_extra_info *gso;
+			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
+
+			if (unlikely(xennet_set_skb_gso(skb, gso))) {
+				__skb_queue_head(&tmpq, skb);
+				goto err;
+			}
+		}
 
 		skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page;
 		skb->h.raw = skb->nh.raw + rx->offset;
diff -r 4f008474675a -r 3df7a115ddde xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h	Fri Jul 28 18:22:22 2006 +1000
+++ b/xen/include/public/io/netif.h	Fri Jul 28 18:22:38 2006 +1000
@@ -127,6 +127,10 @@ typedef struct netif_rx_request netif_rx
 #define _NETRXF_more_data      (2)
 #define  NETRXF_more_data      (1U<<_NETRXF_more_data)
 
+/* Packet to be followed by extra descriptor(s). */
+#define _NETRXF_extra_info     (3)
+#define  NETRXF_extra_info     (1U<<_NETRXF_extra_info)
+
 struct netif_rx_response {
     uint16_t id;
     uint16_t offset;       /* Offset in page of start of received packet  */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Jul-28 08:46 UTC

head link

[Xen-devel] [9/9] [NET] back: Transmit TSO packets if supported

Hi:

[NET] back: Transmit TSO packets if supported

This patch adds TSO transmission support to the backend.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 3df7a115ddde -r 04cbf628ed5a
linux-2.6-xen-sparse/drivers/xen/netback/interface.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:22:38
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/interface.c	Fri Jul 28 18:24:10
2006 +1000
@@ -84,12 +84,26 @@ static int netbk_set_sg(struct net_devic
 	return ethtool_op_set_sg(dev, data);
 }
 
+static int netbk_set_tso(struct net_device *dev, u32 data)
+{
+	if (data) {
+		netif_t *netif = netdev_priv(dev);
+
+		if (!(netif->features & NETIF_F_TSO))
+			return -ENOSYS;
+	}
+
+	return ethtool_op_set_tso(dev, data);
+}
+
 static struct ethtool_ops network_ethtool_ops  {
 	.get_tx_csum = ethtool_op_get_tx_csum,
 	.set_tx_csum = ethtool_op_set_tx_csum,
 	.get_sg = ethtool_op_get_sg,
 	.set_sg = netbk_set_sg,
+	.get_tso = ethtool_op_get_tso,
+	.set_tso = netbk_set_tso,
 	.get_link = ethtool_op_get_link,
 };
 
diff -r 3df7a115ddde -r 04cbf628ed5a
linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:22:38
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c	Fri Jul 28 18:24:10
2006 +1000
@@ -50,12 +50,12 @@ static void make_tx_response(netif_t *ne
 static void make_tx_response(netif_t *netif, 
 			     netif_tx_request_t *txp,
 			     s8       st);
-static int  make_rx_response(netif_t *netif, 
-			     u16      id, 
-			     s8       st,
-			     u16      offset,
-			     u16      size,
-			     u16      flags);
+static netif_rx_response_t *make_rx_response(netif_t *netif, 
+					     u16      id, 
+					     s8       st,
+					     u16      offset,
+					     u16      size,
+					     u16      flags);
 
 static void net_tx_action(unsigned long unused);
 static DECLARE_TASKLET(net_tx_tasklet, net_tx_action, 0);
@@ -225,9 +225,9 @@ static inline int netbk_queue_full(netif
 {
 	RING_IDX peek = netif->rx_req_cons_peek;
 
-	return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS ||
+	return netif->rx.sring->req_prod - peek <= MAX_SKB_FRAGS + 1 ||
 	       netif->rx.rsp_prod_pvt + NET_RX_RING_SIZE - peek <-	      
MAX_SKB_FRAGS;
+	       MAX_SKB_FRAGS + 1;
 }
 
 int netif_be_start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -263,7 +263,8 @@ int netif_be_start_xmit(struct sk_buff *
 		skb = nskb;
 	}
 
-	netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1;
+	netif->rx_req_cons_peek += skb_shinfo(skb)->nr_frags + 1 +
+				   !!skb_shinfo(skb)->gso_size;
 	netif_get(netif);
 
 	if (netbk_can_queue(dev) && netbk_queue_full(netif))
@@ -340,11 +341,16 @@ static void netbk_gop_skb(struct sk_buff
 	netif_t *netif = netdev_priv(skb->dev);
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 	int i;
+	int extra;
+
+	meta[count].frag.page_offset = skb_shinfo(skb)->gso_type;
+	meta[count].frag.size = skb_shinfo(skb)->gso_size;
+	extra = !!meta[count].frag.size + 1;
 
 	for (i = 0; i < nr_frags; i++) {
 		meta[++count].frag = skb_shinfo(skb)->frags[i];
 		meta[count].id = netbk_gop_frag(netif, meta[count].frag.page,
-						count, i + 1);
+						count, i + extra);
 	}
 
 	/*
@@ -354,7 +360,7 @@ static void netbk_gop_skb(struct sk_buff
 	meta[count - nr_frags].id = netbk_gop_frag(netif,
 						   virt_to_page(skb->data),
 						   count - nr_frags, 0);
-	netif->rx.req_cons += nr_frags + 1;
+	netif->rx.req_cons += nr_frags + extra;
 }
 
 static inline void netbk_free_pages(int nr_frags, struct netbk_rx_meta *meta)
@@ -415,6 +421,8 @@ static void net_rx_action(unsigned long 
 	netif_t *netif = NULL; 
 	s8 status;
 	u16 id, irq, flags;
+	netif_rx_response_t *resp;
+	struct netif_extra_info *extra;
 	multicall_entry_t *mcl;
 	struct sk_buff_head rxq;
 	struct sk_buff *skb;
@@ -504,8 +512,33 @@ static void net_rx_action(unsigned long 
 		else if (skb->proto_data_valid) /* remote but checksummed? */
 			flags |= NETRXF_data_validated;
 
-		make_rx_response(netif, id, status, offset_in_page(skb->data),
-				 skb_headlen(skb), flags);
+		resp = make_rx_response(netif, id, status,
+					offset_in_page(skb->data),
+					skb_headlen(skb), flags);
+
+		extra = NULL;
+
+		if (meta[count].frag.size) {
+			struct netif_extra_info *gso +				(struct netif_extra_info *)
+				RING_GET_RESPONSE(&netif->rx,
+						  netif->rx.rsp_prod_pvt++);
+
+			if (extra)
+				extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
+			else
+				resp->flags |= NETRXF_extra_info;
+
+			gso->u.gso.size = meta[count].frag.size;
+			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
+			gso->u.gso.pad = 0;
+			gso->u.gso.features = 0;
+
+			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
+			gso->flags = 0;
+			extra = gso;
+		}
+
 		netbk_add_frag_responses(netif, status, meta + count + 1,
 					 nr_frags);
 
@@ -1183,12 +1216,12 @@ static void make_tx_response(netif_t *ne
 #endif
 }
 
-static int make_rx_response(netif_t *netif, 
-			    u16      id, 
-			    s8       st,
-			    u16      offset,
-			    u16      size,
-			    u16      flags)
+static netif_rx_response_t *make_rx_response(netif_t *netif, 
+					     u16      id, 
+					     s8       st,
+					     u16      offset,
+					     u16      size,
+					     u16      flags)
 {
 	RING_IDX i = netif->rx.rsp_prod_pvt;
 	netif_rx_response_t *resp;
@@ -1203,7 +1236,7 @@ static int make_rx_response(netif_t *net
 
 	netif->rx.rsp_prod_pvt = ++i;
 
-	return 0;
+	return resp;
 }
 
 #ifdef NETBE_DEBUG_INTERRUPT
diff -r 3df7a115ddde -r 04cbf628ed5a
linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 28 18:22:38 2006
+1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c	Fri Jul 28 18:24:10 2006
+1000
@@ -384,6 +384,14 @@ static int connect_rings(struct backend_
 		be->netif->dev->features |= NETIF_F_SG;
 	}
 
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
"%d",
+			 &val) < 0)
+		val = 0;
+	if (val) {
+		be->netif->features |= NETIF_F_TSO;
+		be->netif->dev->features |= NETIF_F_TSO;
+	}
+
 	/* Map the shared frame, irq etc. */
 	err = netif_map(be->netif, tx_ring_ref, rx_ring_ref, evtchn);
 	if (err) {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Aug-04 14:44 UTC

head link

Re: [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU

On Fri, Aug 04, 2006 at 12:06:51PM +0100, Emmanuel Ackaouy
wrote:> 
> vif = [ '''', ''bridge=xenbr0'' ]
Interesting.  I believe this is due to a missing netif_rx_schedule
when netfront''s interface is brought up.  Please let me know if this
patch fixes it or not.

[NET] front: Check for received packets in network_open0

Because the backend brings up the interface long before the frontend
has booted up, it is possible that by the time we get here we already
have packets queued up for processing.

If we don''t process them here, we may delay them more than what is
necessary.  Worse yet, it is possible to miss the notification
interrupt from the backend in such a way that we never get another
one until we bring the interface down and up.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff -r 3efef49b7b49 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Fri Aug 04 17:03:43
2006 +1000
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c	Sat Aug 05 00:21:48
2006 +1000
@@ -497,6 +497,9 @@ static int network_open(struct net_devic
 	network_alloc_rx_buffers(dev);
 	np->rx.sring->rsp_event = np->rx.rsp_cons + 1;
 
+	if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
+		netif_rx_schedule(dev);
+
 	netif_start_queue(dev);
 
 	return 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Aug-04 15:10 UTC

head link

Re: [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU

On Sat, Aug 05, 2006 at 12:44:09AM +1000, Herbert Xu
wrote:> On Fri, Aug 04, 2006 at 12:06:51PM +0100, Emmanuel Ackaouy wrote:
> > 
> > vif = [ '''', ''bridge=xenbr0'' ]
> 
> Interesting.  I believe this is due to a missing netif_rx_schedule
> when netfront''s interface is brought up.  Please let me know if
this
> patch fixes it or not.
> 
> [NET] front: Check for received packets in network_open0
That seems to have fixed the problem. I pushed the patch to the
staging tree.

Thanks,
Emmanuel.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-11 16:23 UTC

head link

[Xen-devel] RX_COPY_THRESHOLD in netfront

Herbert,

Does the RX_COPY_THRESHOLD you added to netfront need to be as large as 256?
It seems rather large: for example, the copy length in netback for packets
being transmitted is just 64.

Also, your patch to netfront allocates a full page for each receive slot.
The skbuff ''header'' area is really not much used, except for
the first
RX_COPY_THRESHOLD bytes of each packet. Rather than allocating lots of
skbuffs up front and then freeing most of them when you receive jumbo
packets, why not just allocate the skbuff part in netif_poll, for each
complete packet that you detect you have received, rather than doing it in
network_alloc_rx_buffers?

Alternatively we could go back to the old method of rx buffer allocation
which allocates page-sized skbuff header areas, then unmaps the page
containing skb->head. This would avoid the need for RX_COPY_THRESHOLD, but
would constrain the location of the first packet fragment (since the skb
header area needs 16 bytes headroom and tailroom for skb_shared_info) or
we''d need a slow path to relocate the header fragment. What do you
think?

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Herbert Xu

2006-Aug-12 00:29 UTC

head link

[Xen-devel] Re: RX_COPY_THRESHOLD in netfront

Hi Keir:

On Fri, Aug 11, 2006 at 05:23:20PM +0100, Keir Fraser
wrote:> 
> Does the RX_COPY_THRESHOLD you added to netfront need to be as large as
256?
> It seems rather large: for example, the copy length in netback for packets
> being transmitted is just 64.
That was fairly arbitrary, but 64 bytes is probably too small since
it''ll
keep IPv6 TCP headers out of the head area (2 + 14 + 40 + 28 > 64).  So we
want at least 128 bytes for both front and back.
> Also, your patch to netfront allocates a full page for each receive slot.
> The skbuff ''header'' area is really not much used, except
for the first
> RX_COPY_THRESHOLD bytes of each packet. Rather than allocating lots of
> skbuffs up front and then freeing most of them when you receive jumbo
> packets, why not just allocate the skbuff part in netif_poll, for each
> complete packet that you detect you have received, rather than doing it in
> network_alloc_rx_buffers?
The main reason for preallocation is to avoid unnecessary work.  When
the system is under memory pressure, if we postpone the allocation then
all the work done to move the packet from dom0 to the point where we are
would be wasted should the allocation fail at that point.

On the other, if the failure occurs during pre-allocation, then dom0
would not even get to place the skb into the ring since there would
be no room there.

I do concede that we''re wasting effort in repeatedly initialising
skb''s
that we throw away in the case of jumbo packets.  We can remove that
waste by maintaining our own list of unused skb''s that we can simply
put back on the ring in network_alloc_rx_buffers without going through
alloc_skb again.
> Alternatively we could go back to the old method of rx buffer allocation
> which allocates page-sized skbuff header areas, then unmaps the page
> containing skb->head. This would avoid the need for RX_COPY_THRESHOLD,
but
> would constrain the location of the first packet fragment (since the skb
> header area needs 16 bytes headroom and tailroom for skb_shared_info) or
> we''d need a slow path to relocate the header fragment. What do you
think?
I''d like avoid that because this relies on the internal structure of
sk_buff which may not stay the same as Linux evolves.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Aug-12 09:43 UTC

head link

[Xen-devel] Re: RX_COPY_THRESHOLD in netfront

On 12/8/06 1:29 am, "Herbert Xu" <herbert@gondor.apana.org.au>
wrote:
> I do concede that we''re wasting effort in repeatedly initialising
skb''s
> that we throw away in the case of jumbo packets.  We can remove that
> waste by maintaining our own list of unused skb''s that we can
simply
> put back on the ring in network_alloc_rx_buffers without going through
> alloc_skb again.
If we''re successfully allocating *whole pages* in
network_alloc_rx_buffers(), I''d be surprised if we couldn''t
allocate a
256-byte skbuff in netif_poll(). On the other hand, just putting the skbuffs
on the rx_batch queue is an easier change to make.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2006 - [0/9] [NET]: Add TSO support for dom0=>domU

[Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU

[Xen-devel] [1/9] [NET] front: Stop using rx->id

[Xen-devel] [2/9] [NET] front: Move rx req pushing to one spot

[Xen-devel] [3/9] [NET] back: Replace netif->status with netif_carrier_ok

[Xen-devel] [4/9] [NET] front: Added feature-rx-notify

[Xen-devel] [5/9] [NET] back: Added tx queue

[Xen-devel] [6/9] [NET] front: Add SG support

[Xen-devel] [7/9] [NET] back: Transmit SG packets if supported

[Xen-devel] [8/9] [NET] front: Add TSO support

[Xen-devel] [9/9] [NET] back: Transmit TSO packets if supported

Re: [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU

Re: [Xen-devel] [0/9] [NET]: Add TSO support for dom0=>domU

[Xen-devel] RX_COPY_THRESHOLD in netfront

[Xen-devel] Re: RX_COPY_THRESHOLD in netfront

[Xen-devel] Re: RX_COPY_THRESHOLD in netfront