Willem de Bruijn
2020-Dec-21 23:07 UTC
[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
On Wed, Dec 16, 2020 at 3:20 AM wangyunjian <wangyunjian at huawei.com> wrote:> > From: Yunjian Wang <wangyunjian at huawei.com> > > Currently we break the loop and wake up the vhost_worker when > sendmsg fails. When the worker wakes up again, we'll meet the > same error.The patch is based on the assumption that such error cases always return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb?> This will cause high CPU load. To fix this issue, > we can skip this description by ignoring the error. When we > exceeds sndbuf, the return value of sendmsg is -EAGAIN. In > the case we don't skip the description and don't drop packet.the -> that here and above: description -> descriptor Perhaps slightly revise to more explicitly state that 1. in the case of persistent failure (i.e., bad packet), the driver drops the packet 2. in the case of transient failure (e.g,. memory pressure) the driver schedules the worker to try again later> Signed-off-by: Yunjian Wang <wangyunjian at huawei.com> > --- > drivers/vhost/net.c | 21 +++++++++------------ > 1 file changed, 9 insertions(+), 12 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index c8784dfafdd7..3d33f3183abe 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -827,16 +827,13 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) > msg.msg_flags &= ~MSG_MORE; > } > > - /* TODO: Check specific error and bomb out unless ENOBUFS? */ > err = sock->ops->sendmsg(sock, &msg, len); > - if (unlikely(err < 0)) { > + if (unlikely(err == -EAGAIN)) { > vhost_discard_vq_desc(vq, 1); > vhost_net_enable_vq(net, vq); > break; > - } > - if (err != len) > - pr_debug("Truncated TX packet: len %d != %zd\n", > - err, len); > + } else if (unlikely(err != len)) > + vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len);sending -> send Even though vq_err is a wrapper around pr_debug, I agree with Michael that such a change should be a separate patch to net-next, does not belong in a fix. More importantly, the error message is now the same for persistent errors and for truncated packets. But on truncation the packet was sent, so that is not entirely correct.> done: > vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head); > vq->heads[nvq->done_idx].len = 0; > @@ -922,7 +919,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > msg.msg_flags &= ~MSG_MORE; > } > > - /* TODO: Check specific error and bomb out unless ENOBUFS? */ > err = sock->ops->sendmsg(sock, &msg, len); > if (unlikely(err < 0)) { > if (zcopy_used) { > @@ -931,13 +927,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) > nvq->upend_idx = ((unsigned)nvq->upend_idx - 1) > % UIO_MAXIOV; > } > - vhost_discard_vq_desc(vq, 1); > - vhost_net_enable_vq(net, vq); > - break; > + if (err == -EAGAIN) { > + vhost_discard_vq_desc(vq, 1); > + vhost_net_enable_vq(net, vq); > + break; > + } > } > if (err != len) > - pr_debug("Truncated TX packet: " > - " len %d != %zd\n", err, len); > + vq_err(vq, "Fail to sending packets err : %d, len : %zd\n", err, len); > if (!zcopy_used) > vhost_add_used_and_signal(&net->dev, vq, head, 0); > else > -- > 2.23.0 >
Jason Wang
2020-Dec-22 04:41 UTC
[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
On 2020/12/22 ??7:07, Willem de Bruijn wrote:> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian at huawei.com> wrote: >> From: Yunjian Wang<wangyunjian at huawei.com> >> >> Currently we break the loop and wake up the vhost_worker when >> sendmsg fails. When the worker wakes up again, we'll meet the >> same error. > The patch is based on the assumption that such error cases always > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb? > >> This will cause high CPU load. To fix this issue, >> we can skip this description by ignoring the error. When we >> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In >> the case we don't skip the description and don't drop packet. > the -> that > > here and above: description -> descriptor > > Perhaps slightly revise to more explicitly state that > > 1. in the case of persistent failure (i.e., bad packet), the driver > drops the packet > 2. in the case of transient failure (e.g,. memory pressure) the driver > schedules the worker to try again laterIf we want to go with this way, we need a better time to wakeup the worker. Otherwise it just produces more stress on the cpu that is what this patch tries to avoid. Thanks> >