Jason Wang
2014-Oct-15  07:25 UTC
[RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
According to David, proper accounting and queueing (at all levels, not just TCP sockets) is more important than trying to skim a bunch of cycles by avoiding TX interrupts. Having an event to free the SKB is absolutely essential for the stack to operate correctly. This series tries to enable tx interrupt for virtio-net. The idea is simple: enable tx interrupt and schedule a tx napi to free old xmit skbs. Several notes: - Tx interrupt storm avoidance when queue is about to be full is kept. Since we may enable callbacks in both ndo_start_xmit() and tx napi, patch 1 adds a check to make sure used event never go back. This will let the napi not enable the callbacks wrongly after delayed callbacks was used. - For bulk dequeuing, there's no need to enable tx interrupt for each packet. The last patch only enable tx interrupt for the final skb in the chain through xmit_more and a new helper to publish current avail idx as used event. This series fixes several issues of original rfc pointed out by Michael. Smoking test is done, and will do complete netperf test. Send the seires for early review. Thanks Jason Wang (6): virtio: make sure used event never go backwards virtio: introduce virtio_enable_cb_avail() virtio-net: small optimization on free_old_xmit_skbs() virtio-net: return the number of packets sent in free_old_xmit_skbs() virtio-net: enable tx interrupt virtio-net: enable tx interrupt only for the final skb in the chain drivers/net/virtio_net.c | 125 ++++++++++++++++++++++++++++++------------ drivers/virtio/virtio_ring.c | 25 ++++++++- include/linux/virtio.h | 2 + 3 files changed, 115 insertions(+), 37 deletions(-)
Jason Wang
2014-Oct-15  07:25 UTC
[RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
This patch checks the new event idx to make sure used event idx never goes back. This is used to synchronize the calls between virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). Cc: Rusty Russell <rusty at rustcorp.com.au> Cc: Michael S. Tsirkin <mst at redhat.com> Signed-off-by: Jason Wang <jasowang at redhat.com> --- drivers/virtio/virtio_ring.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 3b1f89b..1b3929f 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) u16 last_used_idx; START_USE(vq); - + last_used_idx = vq->last_used_idx; /* We optimistically turn back on interrupts, then check if there was * more to do. */ /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to * either clear the flags bit or point the event index at the next * entry. Always do both to keep code simple. */ vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT; - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx; + /* Make sure used event never go backwards */ + if (!vring_need_event(vring_used_event(&vq->vring), + vq->vring.avail->idx, last_used_idx)) + vring_used_event(&vq->vring) = last_used_idx; END_USE(vq); return last_used_idx; } -- 1.7.1
Jason Wang
2014-Oct-15  07:25 UTC
[RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
This patch introduces virtio_enable_cb_avail() to publish avail idx
and used event. This could be used by batched buffer submitting to
reduce the number of tx interrupts.
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
 include/linux/virtio.h       |    2 ++
 2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b3929f..d67fbf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue
*_vq)
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	/* Make sure used event never go backwards */
-	if (!vring_need_event(vring_used_event(&vq->vring),
-			      vq->vring.avail->idx, last_used_idx))
+	if (vq->vring.avail->idx != vring_used_event(&vq->vring)
&&
+	    !vring_need_event(vring_used_event(&vq->vring),
+			      vq->vring.avail->idx, last_used_idx)) {
 		vring_used_event(&vq->vring) = last_used_idx;
+	}
 	END_USE(vq);
 	return last_used_idx;
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	bool ret;
+
+	START_USE(vq);
+	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+	vring_used_event(&vq->vring) = vq->vring.avail->idx;
+	ret = vring_need_event(vq->vring.avail->idx,
+			       vq->last_used_idx, vq->vring.used->idx);
+	END_USE(vq);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
+
 /**
  * virtqueue_poll - query pending used buffers
  * @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b46671e..bfaf058 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
1.7.1
Jason Wang
2014-Oct-15  07:25 UTC
[RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
Accumulate the sent packets and sent bytes in local variables and perform a
single u64_stats_update_begin/end() after.
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/virtio_net.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	u64 tx_bytes = 0, tx_packets = 0;
 
 	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
+		tx_bytes += skb->len;
+		tx_packets++;
 
 		dev_kfree_skb_any(skb);
 	}
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets =+ tx_packets;
+	u64_stats_update_end(&stats->tx_syncp);
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1
Jason Wang
2014-Oct-15  07:25 UTC
[RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs()
This patch returns the number of packets sent in free_old_xmit_skbs(), this is
necessary for calling this function in napi.
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/virtio_net.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a4d56b8..ccf98f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -827,7 +827,7 @@ static int virtnet_open(struct net_device *dev)
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static int free_old_xmit_skbs(struct send_queue *sq)
 {
 	struct sk_buff *skb;
 	unsigned int len;
@@ -848,6 +848,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	stats->tx_bytes += tx_bytes;
 	stats->tx_packets =+ tx_packets;
 	u64_stats_update_end(&stats->tx_syncp);
+
+	return tx_packets;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1
Orphan skb in ndo_start_xmit() breaks socket accounting and packet
queuing. This in fact breaks lots of things such as pktgen and several
TCP optimizations. And also make BQL can't be implemented for
virtio-net.
This patch tries to solve this issue by enabling tx interrupt. To
avoid introducing extra spinlocks, a tx napi was scheduled to free
those packets.
More tx interrupt mitigation method could be used on top.
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
 1 files changed, 85 insertions(+), 40 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ccf98f9..2afc2e2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq,
gfp_t gfp_mask)
 	return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	u64 tx_bytes = 0, tx_packets = 0;
+
+	while (tx_packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		tx_bytes += skb->len;
+		tx_packets++;
+
+		dev_kfree_skb_any(skb);
+	}
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets =+ tx_packets;
+	u64_stats_update_end(&stats->tx_syncp);
+
+	return tx_packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi)) {
+		__napi_schedule(&sq->napi);
+	}
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,7 +801,39 @@ again:
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq +		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	unsigned int r, sent = 0;
+
+again:
+	__netif_tx_lock(txq, smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
+	sent += free_old_xmit_skbs(sq, budget - sent);
+
+	if (sent < budget) {
+		r = virtqueue_enable_cb_prepare(sq->vq);
+		napi_complete(napi);
+		__netif_tx_unlock(txq);
+		if (unlikely(virtqueue_poll(sq->vq, r)) &&
+		    napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+			goto again;
+		}
+	} else {
+		__netif_tx_unlock(txq);
+	}
+
+	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+	return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
+
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
 {
@@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-	u64 tx_bytes = 0, tx_packets = 0;
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		tx_bytes += skb->len;
-		tx_packets++;
-
-		dev_kfree_skb_any(skb);
-	}
-
-	u64_stats_update_begin(&stats->tx_syncp);
-	stats->tx_bytes += tx_bytes;
-	stats->tx_packets =+ tx_packets;
-	u64_stats_update_end(&stats->tx_syncp);
-
-	return tx_packets;
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr;
@@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff
*skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
+
 	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
 }
 
@@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct
net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
-	int err;
+	int err, qsize = virtqueue_get_vring_size(sq->vq);
 
+	virtqueue_disable_cb(sq->vq);
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, qsize);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct
net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
 		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, qsize);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
+	} else if (virtqueue_enable_cb(sq->vq)) {
+		free_old_xmit_skbs(sq, qsize);
 	}
 
 	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
@@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
1.7.1
Jason Wang
2014-Oct-15  07:25 UTC
[RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
enable tx interrupt only for the final skb in the chain if
needed. This will help to mitigate tx interrupts.
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/virtio_net.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2afc2e2..5943f3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct
net_device *dev)
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
-	} else if (virtqueue_enable_cb(sq->vq)) {
-		free_old_xmit_skbs(sq, qsize);
 	}
 
-	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
+	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
 		virtqueue_kick(sq->vq);
+		if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
+		    unlikely(virtqueue_enable_cb_avail(sq->vq))) {
+			free_old_xmit_skbs(sq, qsize);
+			virtqueue_disable_cb(sq->vq);
+		}
+	}
 
 	return NETDEV_TX_OK;
 }
-- 
1.7.1
Michael S. Tsirkin
2014-Oct-15  09:28 UTC
[RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:> This patch introduces virtio_enable_cb_avail() to publish avail idx > and used event. This could be used by batched buffer submitting to > reduce the number of tx interrupts. > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com> > --- > drivers/virtio/virtio_ring.c | 22 ++++++++++++++++++++-- > include/linux/virtio.h | 2 ++ > 2 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 1b3929f..d67fbf8 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) > * entry. Always do both to keep code simple. */ > vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT; > /* Make sure used event never go backwards */ > - if (!vring_need_event(vring_used_event(&vq->vring), > - vq->vring.avail->idx, last_used_idx)) > + if (vq->vring.avail->idx != vring_used_event(&vq->vring) && > + !vring_need_event(vring_used_event(&vq->vring), > + vq->vring.avail->idx, last_used_idx)) { > vring_used_event(&vq->vring) = last_used_idx; > + } > END_USE(vq); > return last_used_idx; > } > EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare); >I see you are also changing virtqueue_enable_cb_prepare, why?> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq) > +{ > + struct vring_virtqueue *vq = to_vvq(_vq); > + bool ret; > + > + START_USE(vq); > + vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT; > + vring_used_event(&vq->vring) = vq->vring.avail->idx; > + ret = vring_need_event(vq->vring.avail->idx, > + vq->last_used_idx, vq->vring.used->idx); > + END_USE(vq); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail); > + > /** > * virtqueue_poll - query pending used buffers > * @vq: the struct virtqueue we're talking about.Could not figure out what this does. Please add documentation.> diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index b46671e..bfaf058 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); > > unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); > > +bool virtqueue_enable_cb_avail(struct virtqueue *vq); > + > bool virtqueue_poll(struct virtqueue *vq, unsigned); > > bool virtqueue_enable_cb_delayed(struct virtqueue *vq); > -- > 1.7.1
Michael S. Tsirkin
2014-Oct-15  09:34 UTC
[RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:> This patch checks the new event idx to make sure used event idx never > goes back. This is used to synchronize the calls between > virtqueue_enable_cb_delayed() and virtqueue_enable_cb(). > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com>the implication being that moving event idx back might cause some race condition? If yes but please describe the race explicitly. Is there a bug we need to fix on stable? Please also explicitly describe a configuration that causes event idx to go back. All this info should go in the commit log.> --- > drivers/virtio/virtio_ring.c | 7 +++++-- > 1 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 3b1f89b..1b3929f 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq) > u16 last_used_idx; > > START_USE(vq); > - > + last_used_idx = vq->last_used_idx; > /* We optimistically turn back on interrupts, then check if there was > * more to do. */ > /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to > * either clear the flags bit or point the event index at the next > * entry. Always do both to keep code simple. */ > vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT; > - vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx; > + /* Make sure used event never go backwards */s/go/goes/> + if (!vring_need_event(vring_used_event(&vq->vring), > + vq->vring.avail->idx, last_used_idx)) > + vring_used_event(&vq->vring) = last_used_idx;The result will be that driver will *not* get an interrupt on the next consumed buffer, which is likely not what driver intended when it called virtqueue_enable_cb. Instead, how about we simply document the requirement that drivers either always call virtqueue_enable_cb_delayed or virtqueue_enable_cb but not both?> END_USE(vq); > return last_used_idx; > } > -- > 1.7.1
Eric Dumazet
2014-Oct-15  09:36 UTC
[RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:> Accumulate the sent packets and sent bytes in local variables and perform a > single u64_stats_update_begin/end() after. > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com> > --- > drivers/net/virtio_net.c | 12 ++++++++---- > 1 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 3d0ce44..a4d56b8 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) > unsigned int len; > struct virtnet_info *vi = sq->vq->vdev->priv; > struct virtnet_stats *stats = this_cpu_ptr(vi->stats); > + u64 tx_bytes = 0, tx_packets = 0; > > while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { > pr_debug("Sent skb %p\n", skb); > > - u64_stats_update_begin(&stats->tx_syncp); > - stats->tx_bytes += skb->len; > - stats->tx_packets++; > - u64_stats_update_end(&stats->tx_syncp); > + tx_bytes += skb->len; > + tx_packets++; > > dev_kfree_skb_any(skb); > } > + > + u64_stats_update_begin(&stats->tx_syncp); > + stats->tx_bytes += tx_bytes; > + stats->tx_packets =+ tx_packets;stats->tx_packets += tx_packets;> + u64_stats_update_end(&stats->tx_syncp); > } > > static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
Michael S. Tsirkin
2014-Oct-15  09:37 UTC
[RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:> Accumulate the sent packets and sent bytes in local variables and perform a > single u64_stats_update_begin/end() after. > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com>Not sure how much it's worth but since Eric suggested it ... Acked-by: Michael S. Tsirkin <mst at redhat.com>> --- > drivers/net/virtio_net.c | 12 ++++++++---- > 1 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 3d0ce44..a4d56b8 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq) > unsigned int len; > struct virtnet_info *vi = sq->vq->vdev->priv; > struct virtnet_stats *stats = this_cpu_ptr(vi->stats); > + u64 tx_bytes = 0, tx_packets = 0; > > while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { > pr_debug("Sent skb %p\n", skb); > > - u64_stats_update_begin(&stats->tx_syncp); > - stats->tx_bytes += skb->len; > - stats->tx_packets++; > - u64_stats_update_end(&stats->tx_syncp); > + tx_bytes += skb->len; > + tx_packets++; > > dev_kfree_skb_any(skb); > } > + > + u64_stats_update_begin(&stats->tx_syncp); > + stats->tx_bytes += tx_bytes; > + stats->tx_packets =+ tx_packets; > + u64_stats_update_end(&stats->tx_syncp); > } > > static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) > -- > 1.7.1
Eric Dumazet
2014-Oct-15  09:37 UTC
[RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote: ...> +static int free_old_xmit_skbs(struct send_queue *sq, int budget) > +{ > + struct sk_buff *skb; > + unsigned int len; > + struct virtnet_info *vi = sq->vq->vdev->priv; > + struct virtnet_stats *stats = this_cpu_ptr(vi->stats); > + u64 tx_bytes = 0, tx_packets = 0; > + > + while (tx_packets < budget && > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { > + pr_debug("Sent skb %p\n", skb); > + > + tx_bytes += skb->len; > + tx_packets++; > + > + dev_kfree_skb_any(skb); > + } > + > + u64_stats_update_begin(&stats->tx_syncp); > + stats->tx_bytes += tx_bytes; > + stats->tx_packets =+ tx_packets;stats->tx_packets += tx_packets;> + u64_stats_update_end(&stats->tx_syncp); > + > + return tx_packets; > +}
Michael S. Tsirkin
2014-Oct-15  10:18 UTC
[RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:> Orphan skb in ndo_start_xmit() breaks socket accounting and packet > queuing. This in fact breaks lots of things such as pktgen and several > TCP optimizations. And also make BQL can't be implemented for > virtio-net. > > This patch tries to solve this issue by enabling tx interrupt. To > avoid introducing extra spinlocks, a tx napi was scheduled to free > those packets. > > More tx interrupt mitigation method could be used on top. > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com> > --- > drivers/net/virtio_net.c | 125 +++++++++++++++++++++++++++++++--------------- > 1 files changed, 85 insertions(+), 40 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index ccf98f9..2afc2e2 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -72,6 +72,8 @@ struct send_queue { > > /* Name of the send queue: output.$index */ > char name[40]; > + > + struct napi_struct napi; > }; > > /* Internal representation of a receive virtqueue */ > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) > return p; > } > > +static int free_old_xmit_skbs(struct send_queue *sq, int budget) > +{ > + struct sk_buff *skb; > + unsigned int len; > + struct virtnet_info *vi = sq->vq->vdev->priv; > + struct virtnet_stats *stats = this_cpu_ptr(vi->stats); > + u64 tx_bytes = 0, tx_packets = 0; > + > + while (tx_packets < budget && > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { > + pr_debug("Sent skb %p\n", skb); > + > + tx_bytes += skb->len; > + tx_packets++; > + > + dev_kfree_skb_any(skb); > + } > + > + u64_stats_update_begin(&stats->tx_syncp); > + stats->tx_bytes += tx_bytes; > + stats->tx_packets =+ tx_packets; > + u64_stats_update_end(&stats->tx_syncp); > + > + return tx_packets; > +} > + > static void skb_xmit_done(struct virtqueue *vq) > { > struct virtnet_info *vi = vq->vdev->priv; > + struct send_queue *sq = &vi->sq[vq2txq(vq)]; > > - /* Suppress further interrupts. */ > - virtqueue_disable_cb(vq); > - > - /* We were probably waiting for more output buffers. */ > - netif_wake_subqueue(vi->dev, vq2txq(vq)); > + if (napi_schedule_prep(&sq->napi)) { > + __napi_schedule(&sq->napi); > + } > } > > static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx) > @@ -774,7 +801,39 @@ again: > return received; > } > > +static int virtnet_poll_tx(struct napi_struct *napi, int budget) > +{ > + struct send_queue *sq > + container_of(napi, struct send_queue, napi); > + struct virtnet_info *vi = sq->vq->vdev->priv; > + struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq)); > + unsigned int r, sent = 0; > + > +again: > + __netif_tx_lock(txq, smp_processor_id()); > + virtqueue_disable_cb(sq->vq); > + sent += free_old_xmit_skbs(sq, budget - sent); > + > + if (sent < budget) { > + r = virtqueue_enable_cb_prepare(sq->vq); > + napi_complete(napi); > + __netif_tx_unlock(txq); > + if (unlikely(virtqueue_poll(sq->vq, r)) &&So you are enabling callback on the next packet, which is almost sure to cause an interrupt storm on the guest. I think it's a bad idea, this is why I used enable_cb_delayed in my patch.> + napi_schedule_prep(napi)) { > + virtqueue_disable_cb(sq->vq); > + __napi_schedule(napi); > + goto again; > + } > + } else { > + __netif_tx_unlock(txq); > + } > + > + netif_wake_subqueue(vi->dev, vq2txq(sq->vq)); > + return sent; > +} > + > #ifdef CONFIG_NET_RX_BUSY_POLL > + > /* must be called with local_bh_disable()d */ > static int virtnet_busy_poll(struct napi_struct *napi) > { > @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev) > if (!try_fill_recv(&vi->rq[i], GFP_KERNEL)) > schedule_delayed_work(&vi->refill, 0); > virtnet_napi_enable(&vi->rq[i]); > + napi_enable(&vi->sq[i].napi); > } > > return 0; > } > > -static int free_old_xmit_skbs(struct send_queue *sq) > -{ > - struct sk_buff *skb; > - unsigned int len; > - struct virtnet_info *vi = sq->vq->vdev->priv; > - struct virtnet_stats *stats = this_cpu_ptr(vi->stats); > - u64 tx_bytes = 0, tx_packets = 0; > - > - while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { > - pr_debug("Sent skb %p\n", skb); > - > - tx_bytes += skb->len; > - tx_packets++; > - > - dev_kfree_skb_any(skb); > - } > - > - u64_stats_update_begin(&stats->tx_syncp); > - stats->tx_bytes += tx_bytes; > - stats->tx_packets =+ tx_packets; > - u64_stats_update_end(&stats->tx_syncp); > - > - return tx_packets; > -} > -So you end up moving it all anyway, why bother splitting out minor changes in previous patches?> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) > { > struct skb_vnet_hdr *hdr; > @@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) > sg_set_buf(sq->sg, hdr, hdr_len); > num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1; > } > + > return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC); > } > > @@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct virtnet_info *vi = netdev_priv(dev); > int qnum = skb_get_queue_mapping(skb); > struct send_queue *sq = &vi->sq[qnum]; > - int err; > + int err, qsize = virtqueue_get_vring_size(sq->vq); > > + virtqueue_disable_cb(sq->vq); > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq); > + free_old_xmit_skbs(sq, qsize); > > /* Try to transmit */ > err = xmit_skb(sq, skb); > @@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > return NETDEV_TX_OK; > } > > - /* Don't wait up for transmitted skbs to be freed. */ > - skb_orphan(skb); > - nf_reset(skb); > - > /* Apparently nice girls don't return TX_BUSY; stop the queue > * before it gets out of hand. Naturally, this wastes entries. */ > if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { > netif_stop_subqueue(dev, qnum); > if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { > /* More just got used, free them then recheck. */ > - free_old_xmit_skbs(sq); > + free_old_xmit_skbs(sq, qsize); > if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { > netif_start_subqueue(dev, qnum); > virtqueue_disable_cb(sq->vq); > } > } > + } else if (virtqueue_enable_cb(sq->vq)) { > + free_old_xmit_skbs(sq, qsize); > } > > if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) > @@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev) > /* Make sure refill_work doesn't re-enable napi! */ > cancel_delayed_work_sync(&vi->refill); > > - for (i = 0; i < vi->max_queue_pairs; i++) > + for (i = 0; i < vi->max_queue_pairs; i++) { > napi_disable(&vi->rq[i].napi); > + napi_disable(&vi->sq[i].napi); > + } > > return 0; > } > @@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi) > { > int i; > > - for (i = 0; i < vi->max_queue_pairs; i++) > + for (i = 0; i < vi->max_queue_pairs; i++) { > netif_napi_del(&vi->rq[i].napi); > + netif_napi_del(&vi->sq[i].napi); > + } > > kfree(vi->rq); > kfree(vi->sq); > @@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi) > netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll, > napi_weight); > napi_hash_add(&vi->rq[i].napi); > + netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx, > + napi_weight); > > sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg)); > ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT); > @@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev) > if (netif_running(vi->dev)) { > for (i = 0; i < vi->max_queue_pairs; i++) { > napi_disable(&vi->rq[i].napi); > + napi_disable(&vi->sq[i].napi); > napi_hash_del(&vi->rq[i].napi); > netif_napi_del(&vi->rq[i].napi); > + netif_napi_del(&vi->sq[i].napi); > } > } > > @@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev) > if (!try_fill_recv(&vi->rq[i], GFP_KERNEL)) > schedule_delayed_work(&vi->refill, 0); > > - for (i = 0; i < vi->max_queue_pairs; i++) > + for (i = 0; i < vi->max_queue_pairs; i++) { > virtnet_napi_enable(&vi->rq[i]); > + napi_enable(&vi->sq[i].napi); > + } > } > > netif_device_attach(vi->dev); > -- > 1.7.1
Michael S. Tsirkin
2014-Oct-15  10:22 UTC
[RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch > enable tx interrupt only for the final skb in the chain if > needed. This will help to mitigate tx interrupts. > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com>I think you should split xmit_more stuff out.> --- > drivers/net/virtio_net.c | 10 +++++++--- > 1 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 2afc2e2..5943f3f 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > virtqueue_disable_cb(sq->vq); > } > } > - } else if (virtqueue_enable_cb(sq->vq)) { > - free_old_xmit_skbs(sq, qsize); > } > > - if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) > + if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) { > virtqueue_kick(sq->vq); > + if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS && > + unlikely(virtqueue_enable_cb_avail(sq->vq))) { > + free_old_xmit_skbs(sq, qsize); > + virtqueue_disable_cb(sq->vq); > + } > + } > > return NETDEV_TX_OK; > } > -- > 1.7.1
Michael S. Tsirkin
2014-Oct-15  10:25 UTC
[RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:> According to David, proper accounting and queueing (at all levels, not > just TCP sockets) is more important than trying to skim a bunch of > cycles by avoiding TX interrupts.He also mentioned we should find other ways to batch> Having an event to free the SKB is > absolutely essential for the stack to operate correctly. > > This series tries to enable tx interrupt for virtio-net. The idea is > simple: enable tx interrupt and schedule a tx napi to free old xmit > skbs. > > Several notes: > - Tx interrupt storm avoidance when queue is about to be full is > kept.But queue is typically *not* full. More important to avoid interrupt storms in that case IMO.> Since we may enable callbacks in both ndo_start_xmit() and tx > napi, patch 1 adds a check to make sure used event never go > back. This will let the napi not enable the callbacks wrongly after > delayed callbacks was used.So why not just use delayed callbacks?> - For bulk dequeuing, there's no need to enable tx interrupt for each > packet. The last patch only enable tx interrupt for the final skb in > the chain through xmit_more and a new helper to publish current avail > idx as used event. > > This series fixes several issues of original rfc pointed out by Michael.Could you list the issues, for ease of review?> > Smoking test is done, and will do complete netperf test. Send the > seires for early review. > > Thanks > > Jason Wang (6): > virtio: make sure used event never go backwards > virtio: introduce virtio_enable_cb_avail() > virtio-net: small optimization on free_old_xmit_skbs() > virtio-net: return the number of packets sent in free_old_xmit_skbs() > virtio-net: enable tx interrupt > virtio-net: enable tx interrupt only for the final skb in the chain > > drivers/net/virtio_net.c | 125 ++++++++++++++++++++++++++++++------------ > drivers/virtio/virtio_ring.c | 25 ++++++++- > include/linux/virtio.h | 2 + > 3 files changed, 115 insertions(+), 37 deletions(-)
Jason Wang
2014-Oct-15  11:14 UTC
[RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:> On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote: >> According to David, proper accounting and queueing (at all levels, not >> just TCP sockets) is more important than trying to skim a bunch of >> cycles by avoiding TX interrupts. > He also mentioned we should find other ways to batch >Right.>> Having an event to free the SKB is >> absolutely essential for the stack to operate correctly. >> >> This series tries to enable tx interrupt for virtio-net. The idea is >> simple: enable tx interrupt and schedule a tx napi to free old xmit >> skbs. >> >> Several notes: >> - Tx interrupt storm avoidance when queue is about to be full is >> kept. > But queue is typically *not* full. More important to avoid interrupt > storms in that case IMO.Yes.>> Since we may enable callbacks in both ndo_start_xmit() and tx >> napi, patch 1 adds a check to make sure used event never go >> back. This will let the napi not enable the callbacks wrongly after >> delayed callbacks was used. > So why not just use delayed callbacks?This means the tx interrupt are coalesced in a somewhat adaptive way. Need benchmark to see its effect.> >> - For bulk dequeuing, there's no need to enable tx interrupt for each >> packet. The last patch only enable tx interrupt for the final skb in >> the chain through xmit_more and a new helper to publish current avail >> idx as used event. >> >> This series fixes several issues of original rfc pointed out by Michael. > Could you list the issues, for ease of review?Probably just one: - Move the virtqueue_disable_cb() from skb_xmit_done() into virtnet_poll_tx() under tx lock.
Possibly Parallel Threads
- [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
- [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
- [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
- [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
- [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt