thr3ads.net - Virtualization - [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails [Dec 2020]

If this information is useful, please help other people find it:
Share via:

Willem de Bruijn

2020-Dec-21 23:07 UTC

[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

On Wed, Dec 16, 2020 at 3:20 AM wangyunjian <wangyunjian at huawei.com>
wrote:>
> From: Yunjian Wang <wangyunjian at huawei.com>
>
> Currently we break the loop and wake up the vhost_worker when
> sendmsg fails. When the worker wakes up again, we'll meet the
> same error.
The patch is based on the assumption that such error cases always
return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
> This will cause high CPU load. To fix this issue,
> we can skip this description by ignoring the error. When we
> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
> the case we don't skip the description and don't drop packet.
the -> that

here and above: description -> descriptor

Perhaps slightly revise to more explicitly state that

1. in the case of persistent failure (i.e., bad packet), the driver
drops the packet
2. in the case of transient failure (e.g,. memory pressure) the driver
schedules the worker to try again later

> Signed-off-by: Yunjian Wang <wangyunjian at huawei.com>
> ---
>  drivers/vhost/net.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index c8784dfafdd7..3d33f3183abe 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net,
struct socket *sock)
>                                 msg.msg_flags &= ~MSG_MORE;
>                 }
>
> -               /* TODO: Check specific error and bomb out unless ENOBUFS?
*/
>                 err = sock->ops->sendmsg(sock, &msg, len);
> -               if (unlikely(err < 0)) {
> +               if (unlikely(err == -EAGAIN)) {
>                         vhost_discard_vq_desc(vq, 1);
>                         vhost_net_enable_vq(net, vq);
>                         break;
> -               }
> -               if (err != len)
> -                       pr_debug("Truncated TX packet: len %d !=
%zd\n",
> -                                err, len);
> +               } else if (unlikely(err != len))
> +                       vq_err(vq, "Fail to sending packets err : %d,
len : %zd\n", err, len);
sending -> send

Even though vq_err is a wrapper around pr_debug, I agree with Michael
that such a change should be a separate patch to net-next, does not
belong in a fix.

More importantly, the error message is now the same for persistent
errors and for truncated packets. But on truncation the packet was
sent, so that is not entirely correct.
>  done:
>                 vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq,
head);
>                 vq->heads[nvq->done_idx].len = 0;
> @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net *net,
struct socket *sock)
>                         msg.msg_flags &= ~MSG_MORE;
>                 }
>
> -               /* TODO: Check specific error and bomb out unless ENOBUFS?
*/
>                 err = sock->ops->sendmsg(sock, &msg, len);
>                 if (unlikely(err < 0)) {
>                         if (zcopy_used) {
> @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net *net,
struct socket *sock)
>                                 nvq->upend_idx =
((unsigned)nvq->upend_idx - 1)
>                                         % UIO_MAXIOV;
>                         }
> -                       vhost_discard_vq_desc(vq, 1);
> -                       vhost_net_enable_vq(net, vq);
> -                       break;
> +                       if (err == -EAGAIN) {
> +                               vhost_discard_vq_desc(vq, 1);
> +                               vhost_net_enable_vq(net, vq);
> +                               break;
> +                       }
>                 }
>                 if (err != len)
> -                       pr_debug("Truncated TX packet: "
> -                                " len %d != %zd\n", err, len);
> +                       vq_err(vq, "Fail to sending packets err : %d,
len : %zd\n", err, len);
>                 if (!zcopy_used)
>                         vhost_add_used_and_signal(&net->dev, vq,
head, 0);
>                 else
> --
> 2.23.0
>

Jason Wang

2020-Dec-22 04:41 UTC

head link

[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

On 2020/12/22 ??7:07, Willem de Bruijn wrote:> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian at
huawei.com>  wrote:
>> From: Yunjian Wang<wangyunjian at huawei.com>
>>
>> Currently we break the loop and wake up the vhost_worker when
>> sendmsg fails. When the worker wakes up again, we'll meet the
>> same error.
> The patch is based on the assumption that such error cases always
> return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?
>
>> This will cause high CPU load. To fix this issue,
>> we can skip this description by ignoring the error. When we
>> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In
>> the case we don't skip the description and don't drop packet.
> the -> that
>
> here and above: description -> descriptor
>
> Perhaps slightly revise to more explicitly state that
>
> 1. in the case of persistent failure (i.e., bad packet), the driver
> drops the packet
> 2. in the case of transient failure (e.g,. memory pressure) the driver
> schedules the worker to try again later

If we want to go with this way, we need a better time to wakeup the 
worker. Otherwise it just produces more stress on the cpu that is what 
this patch tries to avoid.

Thanks

>
>

Virtualization - Dec 2020 - [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails