thr3ads.net - Linux Virtualization - [PATCH vhost] vhost_net: Fix too many vring kick on busypoll [Jun 2018]

If this information is useful, please help other people find it:
Share via:

Toshiaki Makita

2018-Jun-29 08:09 UTC

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

Under heavy load vhost busypoll may run without suppressing
notification. For example tx zerocopy callback can push tx work while
handle_tx() is running, then busyloop exits due to vhost_has_work()
condition and enables notification but immediately reenters handle_tx()
because the pushed work was tx. In this case handle_tx() tries to
disable notification again, but when using event_idx it by design
cannot. Then busyloop will run without suppressing notification.
Another example is the case where handle_tx() tries to enable
notification but avail idx is advanced so disables it again. This case
also lead to the same situation with event_idx.

The problem is that once we enter this situation busyloop does not work
under heavy load for considerable amount of time, because notification
is likely to happen during busyloop and handle_tx() immediately enables
notification after notification happens. Specifically busyloop detects
notification by vhost_has_work() and then handle_tx() calls
vhost_enable_notify(). Because the detected work was the tx work, it
enters handle_tx(), and enters busyloop without suppression again.
This is likely to be repeated, so with event_idx we are almost not able
to suppress notification in this case.

To fix this, poll the work instead of enabling notification when
busypoll is interrupted by something. IMHO signal_pending() and
vhost_has_work() are kind of interruptions rather than signals to
completely cancel the busypoll, so let's run busypoll after the
necessary work is done. In order to avoid too long busyloop due to
interruption, save the endtime in vq field and use it when reentering
the work function.

I observed this problem makes tx performance poor but does not rx. I
guess it is because rx notification from socket is not that heavy so
did not impact the performance even when we failed to mask the
notification. Anyway for consistency I fixed rx routine as well as tx.

Performance numbers:

- Bulk transfer from guest to external physical server.
    [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)-->
[Server]
- Set 10us busypoll.
- Guest disables checksum and TSO because of host XDP.
- Measured single flow Mbps by netperf, and kicks by perf kvm stat
  (EPT_MISCONFIG event).

                            Before              After
                          Mbps  kicks/s      Mbps  kicks/s
UDP_STREAM 1472byte              247758                 27
                Send   3645.37            6958.10
                Recv   3588.56            6958.10
              1byte                9865                 37
                Send      4.34               5.43
                Recv      4.17               5.26
TCP_STREAM             8801.03    45794   9592.77     2884

Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp>
---
 drivers/vhost/net.c   | 94 +++++++++++++++++++++++++++++++++++----------------
 drivers/vhost/vhost.c |  1 +
 drivers/vhost/vhost.h |  1 +
 3 files changed, 66 insertions(+), 30 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index eeaf6739215f..0e85f628b965 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -391,13 +391,14 @@ static inline unsigned long busy_clock(void)
 	return local_clock() >> 10;
 }
 
-static bool vhost_can_busy_poll(struct vhost_dev *dev,
-				unsigned long endtime)
+static bool vhost_can_busy_poll(unsigned long endtime)
 {
-	return likely(!need_resched()) &&
-	       likely(!time_after(busy_clock(), endtime)) &&
-	       likely(!signal_pending(current)) &&
-	       !vhost_has_work(dev);
+	return likely(!need_resched() && !time_after(busy_clock(), endtime));
+}
+
+static bool vhost_busy_poll_interrupted(struct vhost_dev *dev)
+{
+	return unlikely(signal_pending(current)) || vhost_has_work(dev);
 }
 
 static void vhost_net_disable_vq(struct vhost_net *n,
@@ -437,10 +438,21 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
 
 	if (r == vq->num && vq->busyloop_timeout) {
 		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
-		while (vhost_can_busy_poll(vq->dev, endtime) &&
-		       vhost_vq_avail_empty(vq->dev, vq))
+		if (vq->busyloop_endtime) {
+			endtime = vq->busyloop_endtime;
+			vq->busyloop_endtime = 0;
+		} else {
+			endtime = busy_clock() + vq->busyloop_timeout;
+		}
+		while (vhost_can_busy_poll(endtime)) {
+			if (vhost_busy_poll_interrupted(vq->dev)) {
+				vq->busyloop_endtime = endtime;
+				break;
+			}
+			if (!vhost_vq_avail_empty(vq->dev, vq))
+				break;
 			cpu_relax();
+		}
 		preempt_enable();
 		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
 				      out_num, in_num, NULL, NULL);
@@ -509,12 +521,16 @@ static void handle_tx(struct vhost_net *net)
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
-			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+			if (unlikely(vq->busyloop_endtime)) {
+				/* Busy poll is interrupted. */
+				vhost_poll_queue(&vq->poll);
+			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
 				vhost_disable_notify(&net->dev, vq);
 				continue;
 			}
 			break;
 		}
+		vq->busyloop_endtime = 0;
 		if (in) {
 			vq_err(vq, "Unexpected descriptor format for TX: "
 			       "out %d, int %d\n", out, in);
@@ -642,39 +658,51 @@ static void vhost_rx_signal_used(struct
vhost_net_virtqueue *nvq)
 
 static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
 {
-	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
-	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
-	struct vhost_virtqueue *vq = &nvq->vq;
+	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
+	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
+	struct vhost_virtqueue *rvq = &rnvq->vq;
+	struct vhost_virtqueue *tvq = &tnvq->vq;
 	unsigned long uninitialized_var(endtime);
-	int len = peek_head_len(rvq, sk);
+	int len = peek_head_len(rnvq, sk);
 
-	if (!len && vq->busyloop_timeout) {
+	if (!len && tvq->busyloop_timeout) {
 		/* Flush batched heads first */
-		vhost_rx_signal_used(rvq);
+		vhost_rx_signal_used(rnvq);
 		/* Both tx vq and rx socket were polled here */
-		mutex_lock_nested(&vq->mutex, 1);
-		vhost_disable_notify(&net->dev, vq);
+		mutex_lock_nested(&tvq->mutex, 1);
+		vhost_disable_notify(&net->dev, tvq);
 
 		preempt_disable();
-		endtime = busy_clock() + vq->busyloop_timeout;
+		if (rvq->busyloop_endtime) {
+			endtime = rvq->busyloop_endtime;
+			rvq->busyloop_endtime = 0;
+		} else {
+			endtime = busy_clock() + tvq->busyloop_timeout;
+		}
 
-		while (vhost_can_busy_poll(&net->dev, endtime) &&
-		       !sk_has_rx_data(sk) &&
-		       vhost_vq_avail_empty(&net->dev, vq))
+		while (vhost_can_busy_poll(endtime)) {
+			if (vhost_busy_poll_interrupted(&net->dev)) {
+				rvq->busyloop_endtime = endtime;
+				break;
+			}
+			if (sk_has_rx_data(sk) ||
+			    !vhost_vq_avail_empty(&net->dev, tvq))
+				break;
 			cpu_relax();
+		}
 
 		preempt_enable();
 
-		if (!vhost_vq_avail_empty(&net->dev, vq))
-			vhost_poll_queue(&vq->poll);
-		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-			vhost_disable_notify(&net->dev, vq);
-			vhost_poll_queue(&vq->poll);
+		if (!vhost_vq_avail_empty(&net->dev, tvq)) {
+			vhost_poll_queue(&tvq->poll);
+		} else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
+			vhost_disable_notify(&net->dev, tvq);
+			vhost_poll_queue(&tvq->poll);
 		}
 
-		mutex_unlock(&vq->mutex);
+		mutex_unlock(&tvq->mutex);
 
-		len = peek_head_len(rvq, sk);
+		len = peek_head_len(rnvq, sk);
 	}
 
 	return len;
@@ -804,6 +832,7 @@ static void handle_rx(struct vhost_net *net)
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
 
 	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
+		vq->busyloop_endtime = 0;
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
 		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
@@ -889,7 +918,12 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 	}
-	vhost_net_enable_vq(net, vq);
+	if (unlikely(vq->busyloop_endtime)) {
+		/* Busy poll is interrupted. */
+		vhost_poll_queue(&vq->poll);
+	} else {
+		vhost_net_enable_vq(net, vq);
+	}
 out:
 	vhost_rx_signal_used(nvq);
 	mutex_unlock(&vq->mutex);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9beefa6ed1ce..fe83578fe336 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -323,6 +323,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vhost_reset_is_le(vq);
 	vhost_disable_cross_endian(vq);
 	vq->busyloop_timeout = 0;
+	vq->busyloop_endtime = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
 	__vhost_vq_meta_reset(vq);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 6c844b90a168..7e9cf80ccae9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -144,6 +144,7 @@ struct vhost_virtqueue {
 	bool user_be;
 #endif
 	u32 busyloop_timeout;
+	unsigned long busyloop_endtime;
 };
 
 struct vhost_msg_node {
-- 
2.14.2

Jason Wang

2018-Jun-29 09:30 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

On 2018?06?29? 16:09, Toshiaki Makita wrote:> Under heavy load vhost busypoll may run without suppressing
> notification. For example tx zerocopy callback can push tx work while
> handle_tx() is running, then busyloop exits due to vhost_has_work()
> condition and enables notification but immediately reenters handle_tx()
> because the pushed work was tx. In this case handle_tx() tries to
> disable notification again, but when using event_idx it by design
> cannot. Then busyloop will run without suppressing notification.
> Another example is the case where handle_tx() tries to enable
> notification but avail idx is advanced so disables it again. This case
> also lead to the same situation with event_idx.
Good catch.
>
> The problem is that once we enter this situation busyloop does not work
> under heavy load for considerable amount of time, because notification
> is likely to happen during busyloop and handle_tx() immediately enables
> notification after notification happens. Specifically busyloop detects
> notification by vhost_has_work() and then handle_tx() calls
> vhost_enable_notify(). Because the detected work was the tx work, it
> enters handle_tx(), and enters busyloop without suppression again.
> This is likely to be repeated, so with event_idx we are almost not able
> to suppress notification in this case.
>
> To fix this, poll the work instead of enabling notification when
> busypoll is interrupted by something. IMHO signal_pending() and
> vhost_has_work() are kind of interruptions rather than signals to
> completely cancel the busypoll, so let's run busypoll after the
> necessary work is done. In order to avoid too long busyloop due to
> interruption, save the endtime in vq field and use it when reentering
> the work function.
I think we don't care long busyloop unless e.g tx can starve rx?
>
> I observed this problem makes tx performance poor but does not rx. I
> guess it is because rx notification from socket is not that heavy so
> did not impact the performance even when we failed to mask the
> notification.
I think the reason is:

For tx, we may only process used ring under handle_tx(). Busy polling 
code can not recognize this case.
For rx, we call peek_head_len() after exiting busy loop, so if the work 
is for rx, it can be processed in handle_rx() immediately.
> Anyway for consistency I fixed rx routine as well as tx.
>
> Performance numbers:
>
> - Bulk transfer from guest to external physical server.
>      [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)-->
[Server]
Just to confirm in this case since zerocopy is enabled, we are in fact 
use the generic XDP datapath?
> - Set 10us busypoll.
> - Guest disables checksum and TSO because of host XDP.
> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>    (EPT_MISCONFIG event).
>
>                              Before              After
>                            Mbps  kicks/s      Mbps  kicks/s
> UDP_STREAM 1472byte              247758                 27
>                  Send   3645.37            6958.10
>                  Recv   3588.56            6958.10
>                1byte                9865                 37
>                  Send      4.34               5.43
>                  Recv      4.17               5.26
> TCP_STREAM             8801.03    45794   9592.77     2884
Impressive numbers.
>
> Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp>
> ---
>   drivers/vhost/net.c   | 94
+++++++++++++++++++++++++++++++++++----------------
>   drivers/vhost/vhost.c |  1 +
>   drivers/vhost/vhost.h |  1 +
>   3 files changed, 66 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index eeaf6739215f..0e85f628b965 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -391,13 +391,14 @@ static inline unsigned long busy_clock(void)
>   	return local_clock() >> 10;
>   }
>   
> -static bool vhost_can_busy_poll(struct vhost_dev *dev,
> -				unsigned long endtime)
> +static bool vhost_can_busy_poll(unsigned long endtime)
>   {
> -	return likely(!need_resched()) &&
> -	       likely(!time_after(busy_clock(), endtime)) &&
> -	       likely(!signal_pending(current)) &&
> -	       !vhost_has_work(dev);
> +	return likely(!need_resched() && !time_after(busy_clock(),
endtime));
> +}
> +
> +static bool vhost_busy_poll_interrupted(struct vhost_dev *dev)
> +{
> +	return unlikely(signal_pending(current)) || vhost_has_work(dev);
>   }
>   
>   static void vhost_net_disable_vq(struct vhost_net *n,
> @@ -437,10 +438,21 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net
*net,
>   
>   	if (r == vq->num && vq->busyloop_timeout) {
>   		preempt_disable();
> -		endtime = busy_clock() + vq->busyloop_timeout;
> -		while (vhost_can_busy_poll(vq->dev, endtime) &&
> -		       vhost_vq_avail_empty(vq->dev, vq))
> +		if (vq->busyloop_endtime) {
> +			endtime = vq->busyloop_endtime;
> +			vq->busyloop_endtime = 0;
Looks like endtime may be before current time. So we skip busy loop in 
this case.
> +		} else {
> +			endtime = busy_clock() + vq->busyloop_timeout;
> +		}
> +		while (vhost_can_busy_poll(endtime)) {
> +			if (vhost_busy_poll_interrupted(vq->dev)) {
> +				vq->busyloop_endtime = endtime;
> +				break;
> +			}
> +			if (!vhost_vq_avail_empty(vq->dev, vq))
> +				break;
>   			cpu_relax();
> +		}
>   		preempt_enable();
>   		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>   				      out_num, in_num, NULL, NULL);
> @@ -509,12 +521,16 @@ static void handle_tx(struct vhost_net *net)
>   			break;
>   		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>   		if (head == vq->num) {
> -			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> +			if (unlikely(vq->busyloop_endtime)) {
> +				/* Busy poll is interrupted. */
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>   				vhost_disable_notify(&net->dev, vq);
>   				continue;
>   			}
>   			break;
>   		}
> +		vq->busyloop_endtime = 0;
>   		if (in) {
>   			vq_err(vq, "Unexpected descriptor format for TX: "
>   			       "out %d, int %d\n", out, in);
> @@ -642,39 +658,51 @@ static void vhost_rx_signal_used(struct
vhost_net_virtqueue *nvq)
>   
>   static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock
*sk)
>   {
> -	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
> -	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> -	struct vhost_virtqueue *vq = &nvq->vq;
> +	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
> +	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *rvq = &rnvq->vq;
> +	struct vhost_virtqueue *tvq = &tnvq->vq;
NIt: I admit the variable name is confusing. It would be better to do 
the renaming in a separate patch to ease review.
>   	unsigned long uninitialized_var(endtime);
> -	int len = peek_head_len(rvq, sk);
> +	int len = peek_head_len(rnvq, sk);
>   
> -	if (!len && vq->busyloop_timeout) {
> +	if (!len && tvq->busyloop_timeout) {
>   		/* Flush batched heads first */
> -		vhost_rx_signal_used(rvq);
> +		vhost_rx_signal_used(rnvq);
>   		/* Both tx vq and rx socket were polled here */
> -		mutex_lock_nested(&vq->mutex, 1);
> -		vhost_disable_notify(&net->dev, vq);
> +		mutex_lock_nested(&tvq->mutex, 1);
> +		vhost_disable_notify(&net->dev, tvq);
>   
>   		preempt_disable();
> -		endtime = busy_clock() + vq->busyloop_timeout;
> +		if (rvq->busyloop_endtime) {
> +			endtime = rvq->busyloop_endtime;
> +			rvq->busyloop_endtime = 0;
> +		} else {
> +			endtime = busy_clock() + tvq->busyloop_timeout;
> +		}
>   
> -		while (vhost_can_busy_poll(&net->dev, endtime) &&
> -		       !sk_has_rx_data(sk) &&
> -		       vhost_vq_avail_empty(&net->dev, vq))
> +		while (vhost_can_busy_poll(endtime)) {
> +			if (vhost_busy_poll_interrupted(&net->dev)) {
> +				rvq->busyloop_endtime = endtime;
> +				break;
> +			}
> +			if (sk_has_rx_data(sk) ||
> +			    !vhost_vq_avail_empty(&net->dev, tvq))
> +				break;
>   			cpu_relax();
Actually, I plan to poll guest rx refilling as well here to avoid rx 
kicks. Want to draft another patch to do it as well? It's as simple as 
add a vhost_vq_avail_empty for rvq above.
> +		}
>   
>   		preempt_enable();
>   
> -		if (!vhost_vq_avail_empty(&net->dev, vq))
> -			vhost_poll_queue(&vq->poll);
> -		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> -			vhost_disable_notify(&net->dev, vq);
> -			vhost_poll_queue(&vq->poll);
> +		if (!vhost_vq_avail_empty(&net->dev, tvq)) {
> +			vhost_poll_queue(&tvq->poll);
> +		} else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
> +			vhost_disable_notify(&net->dev, tvq);
> +			vhost_poll_queue(&tvq->poll);
>   		}
>   
> -		mutex_unlock(&vq->mutex);
> +		mutex_unlock(&tvq->mutex);
>   
> -		len = peek_head_len(rvq, sk);
> +		len = peek_head_len(rnvq, sk);
>   	}
>   
>   	return len;
> @@ -804,6 +832,7 @@ static void handle_rx(struct vhost_net *net)
>   	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>   
>   	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
> +		vq->busyloop_endtime = 0;
>   		sock_len += sock_hlen;
>   		vhost_len = sock_len + vhost_hlen;
>   		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> @@ -889,7 +918,12 @@ static void handle_rx(struct vhost_net *net)
>   			goto out;
>   		}
>   	}
> -	vhost_net_enable_vq(net, vq);
> +	if (unlikely(vq->busyloop_endtime)) {
> +		/* Busy poll is interrupted. */
> +		vhost_poll_queue(&vq->poll);
> +	} else {
> +		vhost_net_enable_vq(net, vq);
> +	}
This is to reduce the rx wake ups. Better with another patch.

So I suggest to split the patches into:

1 better naming of variable of vhost_net_rx_peek_head_len().
2 avoid tx kicks (most of what this patch did)
3 avoid rx wakeups (exactly the above 6 lines did)
4 avoid rx kicks

Btw, tonghao is doing some refactoring of busy polling as well. Depends 
on the order of being merged, one of you may need rebasing.

Thanks
>   out:
>   	vhost_rx_signal_used(nvq);
>   	mutex_unlock(&vq->mutex);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9beefa6ed1ce..fe83578fe336 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -323,6 +323,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>   	vhost_reset_is_le(vq);
>   	vhost_disable_cross_endian(vq);
>   	vq->busyloop_timeout = 0;
> +	vq->busyloop_endtime = 0;
>   	vq->umem = NULL;
>   	vq->iotlb = NULL;
>   	__vhost_vq_meta_reset(vq);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 6c844b90a168..7e9cf80ccae9 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -144,6 +144,7 @@ struct vhost_virtqueue {
>   	bool user_be;
>   #endif
>   	u32 busyloop_timeout;
> +	unsigned long busyloop_endtime;
>   };
>   
>   struct vhost_msg_node {

Michael S. Tsirkin

2018-Jun-29 16:38 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

On Fri, Jun 29, 2018 at 05:09:50PM +0900, Toshiaki Makita
wrote:> Under heavy load vhost busypoll may run without suppressing
> notification. For example tx zerocopy callback can push tx work while
> handle_tx() is running, then busyloop exits due to vhost_has_work()
> condition and enables notification but immediately reenters handle_tx()
> because the pushed work was tx. In this case handle_tx() tries to
> disable notification again, but when using event_idx it by design
> cannot. Then busyloop will run without suppressing notification.
> Another example is the case where handle_tx() tries to enable
> notification but avail idx is advanced so disables it again. This case
> also lead to the same situation with event_idx.
> 
> The problem is that once we enter this situation busyloop does not work
> under heavy load for considerable amount of time, because notification
> is likely to happen during busyloop and handle_tx() immediately enables
> notification after notification happens. Specifically busyloop detects
> notification by vhost_has_work() and then handle_tx() calls
> vhost_enable_notify().
I'd like to understand the problem a bit better.
Why does this happen?
Doesn't this only happen if ring is empty?
> Because the detected work was the tx work, it
> enters handle_tx(), and enters busyloop without suppression again.
> This is likely to be repeated, so with event_idx we are almost not able
> to suppress notification in this case.
> 
> To fix this, poll the work instead of enabling notification when
> busypoll is interrupted by something. IMHO signal_pending() and
> vhost_has_work() are kind of interruptions rather than signals to
> completely cancel the busypoll, so let's run busypoll after the
> necessary work is done. In order to avoid too long busyloop due to
> interruption, save the endtime in vq field and use it when reentering
> the work function.
> 
> I observed this problem makes tx performance poor but does not rx. I
> guess it is because rx notification from socket is not that heavy so
> did not impact the performance even when we failed to mask the
> notification. Anyway for consistency I fixed rx routine as well as tx.
> 
> Performance numbers:
> 
> - Bulk transfer from guest to external physical server.
>     [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)-->
[Server]
> - Set 10us busypoll.
> - Guest disables checksum and TSO because of host XDP.
> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>   (EPT_MISCONFIG event).
> 
>                             Before              After
>                           Mbps  kicks/s      Mbps  kicks/s
> UDP_STREAM 1472byte              247758                 27
>                 Send   3645.37            6958.10
>                 Recv   3588.56            6958.10
>               1byte                9865                 37
>                 Send      4.34               5.43
>                 Recv      4.17               5.26
> TCP_STREAM             8801.03    45794   9592.77     2884
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp>
Is this with busy poll enabled?
Are there CPU utilization #s?
> ---
>  drivers/vhost/net.c   | 94
+++++++++++++++++++++++++++++++++++----------------
>  drivers/vhost/vhost.c |  1 +
>  drivers/vhost/vhost.h |  1 +
>  3 files changed, 66 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index eeaf6739215f..0e85f628b965 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -391,13 +391,14 @@ static inline unsigned long busy_clock(void)
>  	return local_clock() >> 10;
>  }
>  
> -static bool vhost_can_busy_poll(struct vhost_dev *dev,
> -				unsigned long endtime)
> +static bool vhost_can_busy_poll(unsigned long endtime)
>  {
> -	return likely(!need_resched()) &&
> -	       likely(!time_after(busy_clock(), endtime)) &&
> -	       likely(!signal_pending(current)) &&
> -	       !vhost_has_work(dev);
> +	return likely(!need_resched() && !time_after(busy_clock(),
endtime));
> +}
> +
> +static bool vhost_busy_poll_interrupted(struct vhost_dev *dev)
> +{
> +	return unlikely(signal_pending(current)) || vhost_has_work(dev);
>  }
>  
>  static void vhost_net_disable_vq(struct vhost_net *n,
> @@ -437,10 +438,21 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net
*net,
>  
>  	if (r == vq->num && vq->busyloop_timeout) {
>  		preempt_disable();
> -		endtime = busy_clock() + vq->busyloop_timeout;
> -		while (vhost_can_busy_poll(vq->dev, endtime) &&
> -		       vhost_vq_avail_empty(vq->dev, vq))
> +		if (vq->busyloop_endtime) {
> +			endtime = vq->busyloop_endtime;
> +			vq->busyloop_endtime = 0;
> +		} else {
> +			endtime = busy_clock() + vq->busyloop_timeout;
> +		}
> +		while (vhost_can_busy_poll(endtime)) {
> +			if (vhost_busy_poll_interrupted(vq->dev)) {
> +				vq->busyloop_endtime = endtime;
> +				break;
> +			}
> +			if (!vhost_vq_avail_empty(vq->dev, vq))
> +				break;
>  			cpu_relax();
> +		}
>  		preempt_enable();
>  		r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>  				      out_num, in_num, NULL, NULL);
> @@ -509,12 +521,16 @@ static void handle_tx(struct vhost_net *net)
>  			break;
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
> -			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> +			if (unlikely(vq->busyloop_endtime)) {
> +				/* Busy poll is interrupted. */
> +				vhost_poll_queue(&vq->poll);
> +			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>  				vhost_disable_notify(&net->dev, vq);
>  				continue;
>  			}
>  			break;
>  		}
> +		vq->busyloop_endtime = 0;
>  		if (in) {
>  			vq_err(vq, "Unexpected descriptor format for TX: "
>  			       "out %d, int %d\n", out, in);
> @@ -642,39 +658,51 @@ static void vhost_rx_signal_used(struct
vhost_net_virtqueue *nvq)
>  
>  static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock
*sk)
>  {
> -	struct vhost_net_virtqueue *rvq = &net->vqs[VHOST_NET_VQ_RX];
> -	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> -	struct vhost_virtqueue *vq = &nvq->vq;
> +	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
> +	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> +	struct vhost_virtqueue *rvq = &rnvq->vq;
> +	struct vhost_virtqueue *tvq = &tnvq->vq;
>  	unsigned long uninitialized_var(endtime);
> -	int len = peek_head_len(rvq, sk);
> +	int len = peek_head_len(rnvq, sk);
>  
> -	if (!len && vq->busyloop_timeout) {
> +	if (!len && tvq->busyloop_timeout) {
>  		/* Flush batched heads first */
> -		vhost_rx_signal_used(rvq);
> +		vhost_rx_signal_used(rnvq);
>  		/* Both tx vq and rx socket were polled here */
> -		mutex_lock_nested(&vq->mutex, 1);
> -		vhost_disable_notify(&net->dev, vq);
> +		mutex_lock_nested(&tvq->mutex, 1);
> +		vhost_disable_notify(&net->dev, tvq);
>  
>  		preempt_disable();
> -		endtime = busy_clock() + vq->busyloop_timeout;
> +		if (rvq->busyloop_endtime) {
> +			endtime = rvq->busyloop_endtime;
> +			rvq->busyloop_endtime = 0;
> +		} else {
> +			endtime = busy_clock() + tvq->busyloop_timeout;
> +		}
>  
> -		while (vhost_can_busy_poll(&net->dev, endtime) &&
> -		       !sk_has_rx_data(sk) &&
> -		       vhost_vq_avail_empty(&net->dev, vq))
> +		while (vhost_can_busy_poll(endtime)) {
> +			if (vhost_busy_poll_interrupted(&net->dev)) {
> +				rvq->busyloop_endtime = endtime;
> +				break;
> +			}
> +			if (sk_has_rx_data(sk) ||
> +			    !vhost_vq_avail_empty(&net->dev, tvq))
> +				break;
>  			cpu_relax();
> +		}
>  
>  		preempt_enable();
>  
> -		if (!vhost_vq_avail_empty(&net->dev, vq))
> -			vhost_poll_queue(&vq->poll);
> -		else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> -			vhost_disable_notify(&net->dev, vq);
> -			vhost_poll_queue(&vq->poll);
> +		if (!vhost_vq_avail_empty(&net->dev, tvq)) {
> +			vhost_poll_queue(&tvq->poll);
> +		} else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
> +			vhost_disable_notify(&net->dev, tvq);
> +			vhost_poll_queue(&tvq->poll);
>  		}
>  
> -		mutex_unlock(&vq->mutex);
> +		mutex_unlock(&tvq->mutex);
>  
> -		len = peek_head_len(rvq, sk);
> +		len = peek_head_len(rnvq, sk);
>  	}
>  
>  	return len;
> @@ -804,6 +832,7 @@ static void handle_rx(struct vhost_net *net)
>  	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
>  
>  	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
> +		vq->busyloop_endtime = 0;
>  		sock_len += sock_hlen;
>  		vhost_len = sock_len + vhost_hlen;
>  		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> @@ -889,7 +918,12 @@ static void handle_rx(struct vhost_net *net)
>  			goto out;
>  		}
>  	}
> -	vhost_net_enable_vq(net, vq);
> +	if (unlikely(vq->busyloop_endtime)) {
> +		/* Busy poll is interrupted. */
> +		vhost_poll_queue(&vq->poll);
> +	} else {
> +		vhost_net_enable_vq(net, vq);
> +	}
>  out:
>  	vhost_rx_signal_used(nvq);
>  	mutex_unlock(&vq->mutex);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9beefa6ed1ce..fe83578fe336 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -323,6 +323,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>  	vhost_reset_is_le(vq);
>  	vhost_disable_cross_endian(vq);
>  	vq->busyloop_timeout = 0;
> +	vq->busyloop_endtime = 0;
>  	vq->umem = NULL;
>  	vq->iotlb = NULL;
>  	__vhost_vq_meta_reset(vq);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 6c844b90a168..7e9cf80ccae9 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -144,6 +144,7 @@ struct vhost_virtqueue {
>  	bool user_be;
>  #endif
>  	u32 busyloop_timeout;
> +	unsigned long busyloop_endtime;
>  };
>  
>  struct vhost_msg_node {
> -- 
> 2.14.2
>

Jason Wang

2018-Jul-02 02:41 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

On 2018?06?30? 00:38, Michael S. Tsirkin wrote:> On Fri, Jun 29, 2018 at 05:09:50PM +0900, Toshiaki Makita wrote:
>> Under heavy load vhost busypoll may run without suppressing
>> notification. For example tx zerocopy callback can push tx work while
>> handle_tx() is running, then busyloop exits due to vhost_has_work()
>> condition and enables notification but immediately reenters handle_tx()
>> because the pushed work was tx. In this case handle_tx() tries to
>> disable notification again, but when using event_idx it by design
>> cannot. Then busyloop will run without suppressing notification.
>> Another example is the case where handle_tx() tries to enable
>> notification but avail idx is advanced so disables it again. This case
>> also lead to the same situation with event_idx.
>>
>> The problem is that once we enter this situation busyloop does not work
>> under heavy load for considerable amount of time, because notification
>> is likely to happen during busyloop and handle_tx() immediately enables
>> notification after notification happens. Specifically busyloop detects
>> notification by vhost_has_work() and then handle_tx() calls
>> vhost_enable_notify().
> I'd like to understand the problem a bit better.
> Why does this happen?
> Doesn't this only happen if ring is empty?
>
My understanding is:

vhost_zerocopy_callback() try to poll vhost virtqueue. This will cause 
the busy loop in vhost_net_tx_get_vq_desc() to exit because of 
vhost_has_work() return true. Then handle_tx() tends to enable 
notification. Then guest may kick us even if handle_tx() call 
vhost_disable_notify() which in fact did nothing for even index.

Maybe we can try to call vhost_zerocopy_signal_used() if we found 
there's pending used from zerocopy instead.

Thanks

Toshiaki Makita

2018-Jul-02 02:45 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

Hi Jason,

On 2018/06/29 18:30, Jason Wang wrote:> On 2018?06?29? 16:09, Toshiaki Makita wrote:
...>> To fix this, poll the work instead of enabling notification when
>> busypoll is interrupted by something. IMHO signal_pending() and
>> vhost_has_work() are kind of interruptions rather than signals to
>> completely cancel the busypoll, so let's run busypoll after the
>> necessary work is done. In order to avoid too long busyloop due to
>> interruption, save the endtime in vq field and use it when reentering
>> the work function.
> 
> I think we don't care long busyloop unless e.g tx can starve rx?
I just want to keep it user-controllable. Unless memorizing it busypoll
can run unexpectedly long.
>> I observed this problem makes tx performance poor but does not rx. I
>> guess it is because rx notification from socket is not that heavy so
>> did not impact the performance even when we failed to mask the
>> notification.
> 
> I think the reason is:
> 
> For tx, we may only process used ring under handle_tx(). Busy polling
> code can not recognize this case.
> For rx, we call peek_head_len() after exiting busy loop, so if the work
> is for rx, it can be processed in handle_rx() immediately.
Sorry but I don't see the difference. Tx busypoll calls
vhost_get_vq_desc() immediately after busypoll exits in
vhost_net_tx_get_vq_desc().

The scenario I described above (in 2nd paragraph) is a bit simplified.
To be clearer what I think is happening is:

1. handle_tx() runs busypoll with notification enabled (due to zerocopy
   callback or something).
2. Guest produces a packet in vring.
3. handle_tx() busypoll detects the produced packet and get it.
4. While vhost is processing the packet, guest kicks vring because it
   produced the packet. Vring notification is disabled automatically by
   event_idx and tx work is queued.
5. After processing the packet vhost enters busyloop again, but detects
   tx work and exits busyloop immediately. Then handle_tx() exits after
   enabling the notification.
6. Because the queued work was tx, handle_tx() is called immediately
   again. handle_tx() runs busypoll with notification enabled.
7. Repeat 2-6.
> 
>> Anyway for consistency I fixed rx routine as well as tx.
>>
>> Performance numbers:
>>
>> - Bulk transfer from guest to external physical server.
>> ???? [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e
--(wire)--> [Server]
> 
> Just to confirm in this case since zerocopy is enabled, we are in fact
> use the generic XDP datapath?
For some reason zerocopy was not applied for most packets, so in most
cases driver XDP was used. I was going to dig into it but not yet.
> 
>> - Set 10us busypoll.
>> - Guest disables checksum and TSO because of host XDP.
>> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>> ?? (EPT_MISCONFIG event).
>>
>> ???????????????????????????? Before????????????? After
>> ?????????????????????????? Mbps? kicks/s????? Mbps? kicks/s
>> UDP_STREAM 1472byte????????????? 247758???????????????? 27
>> ???????????????? Send?? 3645.37??????????? 6958.10
>> ???????????????? Recv?? 3588.56??????????? 6958.10
>> ?????????????? 1byte??????????????? 9865???????????????? 37
>> ???????????????? Send????? 4.34?????????????? 5.43
>> ???????????????? Recv????? 4.17?????????????? 5.26
>> TCP_STREAM???????????? 8801.03??? 45794?? 9592.77???? 2884
> 
> Impressive numbers.
> 
>>
>> Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp>
>> ---
...>> +static bool vhost_busy_poll_interrupted(struct vhost_dev *dev)
>> +{
>> +??? return unlikely(signal_pending(current)) || vhost_has_work(dev);
>> ? }
>> ? ? static void vhost_net_disable_vq(struct vhost_net *n,
>> @@ -437,10 +438,21 @@ static int vhost_net_tx_get_vq_desc(struct
>> vhost_net *net,
>> ? ????? if (r == vq->num && vq->busyloop_timeout) {
>> ????????? preempt_disable();
>> -??????? endtime = busy_clock() + vq->busyloop_timeout;
>> -??????? while (vhost_can_busy_poll(vq->dev, endtime) &&
>> -?????????????? vhost_vq_avail_empty(vq->dev, vq))
>> +??????? if (vq->busyloop_endtime) {
>> +??????????? endtime = vq->busyloop_endtime;
>> +??????????? vq->busyloop_endtime = 0;
> 
> Looks like endtime may be before current time. So we skip busy loop in
> this case.
Sure, I'll add a condition.
> 
>> +??????? } else {
>> +??????????? endtime = busy_clock() + vq->busyloop_timeout;
>> +??????? }
>> +??????? while (vhost_can_busy_poll(endtime)) {
>> +??????????? if (vhost_busy_poll_interrupted(vq->dev)) {
>> +??????????????? vq->busyloop_endtime = endtime;
>> +??????????????? break;
>> +??????????? }
>> +??????????? if (!vhost_vq_avail_empty(vq->dev, vq))
>> +??????????????? break;
>> ????????????? cpu_relax();
>> +??????? }
>> ????????? preempt_enable();
>> ????????? r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> ??????????????????????? out_num, in_num, NULL, NULL);
...>> @@ -642,39 +658,51 @@ static void vhost_rx_signal_used(struct
>> vhost_net_virtqueue *nvq)
>> ? ? static int vhost_net_rx_peek_head_len(struct vhost_net *net,
>> struct sock *sk)
>> ? {
>> -??? struct vhost_net_virtqueue *rvq =
&net->vqs[VHOST_NET_VQ_RX];
>> -??? struct vhost_net_virtqueue *nvq =
&net->vqs[VHOST_NET_VQ_TX];
>> -??? struct vhost_virtqueue *vq = &nvq->vq;
>> +??? struct vhost_net_virtqueue *rnvq =
&net->vqs[VHOST_NET_VQ_RX];
>> +??? struct vhost_net_virtqueue *tnvq =
&net->vqs[VHOST_NET_VQ_TX];
>> +??? struct vhost_virtqueue *rvq = &rnvq->vq;
>> +??? struct vhost_virtqueue *tvq = &tnvq->vq;
> 
> NIt: I admit the variable name is confusing. It would be better to do
> the renaming in a separate patch to ease review.
OK.
> 
>> ????? unsigned long uninitialized_var(endtime);
>> -??? int len = peek_head_len(rvq, sk);
>> +??? int len = peek_head_len(rnvq, sk);
>> ? -??? if (!len && vq->busyloop_timeout) {
>> +??? if (!len && tvq->busyloop_timeout) {
>> ????????? /* Flush batched heads first */
>> -??????? vhost_rx_signal_used(rvq);
>> +??????? vhost_rx_signal_used(rnvq);
>> ????????? /* Both tx vq and rx socket were polled here */
>> -??????? mutex_lock_nested(&vq->mutex, 1);
>> -??????? vhost_disable_notify(&net->dev, vq);
>> +??????? mutex_lock_nested(&tvq->mutex, 1);
>> +??????? vhost_disable_notify(&net->dev, tvq);
>> ? ????????? preempt_disable();
>> -??????? endtime = busy_clock() + vq->busyloop_timeout;
>> +??????? if (rvq->busyloop_endtime) {
>> +??????????? endtime = rvq->busyloop_endtime;
>> +??????????? rvq->busyloop_endtime = 0;
>> +??????? } else {
>> +??????????? endtime = busy_clock() + tvq->busyloop_timeout;
>> +??????? }
>> ? -??????? while (vhost_can_busy_poll(&net->dev, endtime)
&&
>> -?????????????? !sk_has_rx_data(sk) &&
>> -?????????????? vhost_vq_avail_empty(&net->dev, vq))
>> +??????? while (vhost_can_busy_poll(endtime)) {
>> +??????????? if (vhost_busy_poll_interrupted(&net->dev)) {
>> +??????????????? rvq->busyloop_endtime = endtime;
>> +??????????????? break;
>> +??????????? }
>> +??????????? if (sk_has_rx_data(sk) ||
>> +??????????????? !vhost_vq_avail_empty(&net->dev, tvq))
>> +??????????????? break;
>> ????????????? cpu_relax();
> 
> Actually, I plan to poll guest rx refilling as well here to avoid rx
> kicks. Want to draft another patch to do it as well? It's as simple as
> add a vhost_vq_avail_empty for rvq above.
OK. will try it.
> 
>> +??????? }
>> ? ????????? preempt_enable();
>> ? -??????? if (!vhost_vq_avail_empty(&net->dev, vq))
>> -??????????? vhost_poll_queue(&vq->poll);
>> -??????? else if (unlikely(vhost_enable_notify(&net->dev, vq)))
{
>> -??????????? vhost_disable_notify(&net->dev, vq);
>> -??????????? vhost_poll_queue(&vq->poll);
>> +??????? if (!vhost_vq_avail_empty(&net->dev, tvq)) {
>> +??????????? vhost_poll_queue(&tvq->poll);
>> +??????? } else if (unlikely(vhost_enable_notify(&net->dev,
tvq))) {
>> +??????????? vhost_disable_notify(&net->dev, tvq);
>> +??????????? vhost_poll_queue(&tvq->poll);
>> ????????? }
>> ? -??????? mutex_unlock(&vq->mutex);
>> +??????? mutex_unlock(&tvq->mutex);
>> ? -??????? len = peek_head_len(rvq, sk);
>> +??????? len = peek_head_len(rnvq, sk);
>> ????? }
>> ? ????? return len;
...>> @@ -889,7 +918,12 @@ static void handle_rx(struct vhost_net *net)
>> ????????????? goto out;
>> ????????? }
>> ????? }
>> -??? vhost_net_enable_vq(net, vq);
>> +??? if (unlikely(vq->busyloop_endtime)) {
>> +??????? /* Busy poll is interrupted. */
>> +??????? vhost_poll_queue(&vq->poll);
>> +??? } else {
>> +??????? vhost_net_enable_vq(net, vq);
>> +??? }
> 
> This is to reduce the rx wake ups. Better with another patch.
> 
> So I suggest to split the patches into:
> 
> 1 better naming of variable of vhost_net_rx_peek_head_len().
> 2 avoid tx kicks (most of what this patch did)
> 3 avoid rx wakeups (exactly the above 6 lines did)
> 4 avoid rx kicks
OK, I'll reorganize the patch.
Thank you for your feedback!
> 
> Btw, tonghao is doing some refactoring of busy polling as well. Depends
> on the order of being merged, one of you may need rebasing.
Thanks for sharing. I'll take a look.

-- 
Toshiaki Makita

Toshiaki Makita

2018-Jul-02 02:59 UTC

head link

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

On 2018/06/30 1:38, Michael S. Tsirkin wrote:
...>> Performance numbers:
>>
>> - Bulk transfer from guest to external physical server.
>>     [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e
--(wire)--> [Server]
>> - Set 10us busypoll.
>> - Guest disables checksum and TSO because of host XDP.
>> - Measured single flow Mbps by netperf, and kicks by perf kvm stat
>>   (EPT_MISCONFIG event).
>>
>>                             Before              After
>>                           Mbps  kicks/s      Mbps  kicks/s
>> UDP_STREAM 1472byte              247758                 27
>>                 Send   3645.37            6958.10
>>                 Recv   3588.56            6958.10
>>               1byte                9865                 37
>>                 Send      4.34               5.43
>>                 Recv      4.17               5.26
>> TCP_STREAM             8801.03    45794   9592.77     2884
>>
>> Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp>
> 
> Is this with busy poll enabled?
Yes, as I wrote "Set 10us busypoll" above.
> Are there CPU utilization #s?
I used one cpu for one vcpu and one cpu for one vhost.
Each host cpu for vcpu/vhost was like this:

- Before
 vcpu cpu : %guest 70 %sys 30
 vhost cpu: %sys 100

- After
 vcpu cpu : %guest 100
 vhost cpu: %sys 100

I think %sys before patch was caused by vring kick.

-- 
Toshiaki Makita

Possibly Parallel Threads

Search for more possibly parallel threads

Linux Virtualization - Jun 2018 - [PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

[PATCH vhost] vhost_net: Fix too many vring kick on busypoll

Possibly Parallel Threads