thr3ads.net - Virtualization - [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails [Dec 2020]

If this information is useful, please help other people find it:
Share via:

Willem de Bruijn

2020-Dec-23 13:48 UTC

[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

On Wed, Dec 23, 2020 at 8:21 AM wangyunjian <wangyunjian at huawei.com>
wrote:>
> > -----Original Message-----
> > From: Jason Wang [mailto:jasowang at redhat.com]
> > Sent: Wednesday, December 23, 2020 10:54 AM
> > To: Willem de Bruijn <willemdebruijn.kernel at gmail.com>
> > Cc: wangyunjian <wangyunjian at huawei.com>; Network Development
> > <netdev at vger.kernel.org>; Michael S. Tsirkin <mst at
redhat.com>;
> > virtualization at lists.linux-foundation.org; Lilijun (Jerry)
> > <jerry.lilijun at huawei.com>; chenchanghu <chenchanghu at
huawei.com>;
> > xudingke <xudingke at huawei.com>; huangbin (J)
> > <brian.huangbin at huawei.com>
> > Subject: Re: [PATCH net v2 2/2] vhost_net: fix high cpu load when
sendmsg fails
> >
> >
> > On 2020/12/22 ??10:24, Willem de Bruijn wrote:
> > > On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang at
redhat.com>
> > wrote:
> > >>
> > >> On 2020/12/22 ??7:07, Willem de Bruijn wrote:
> > >>> On Wed, Dec 16, 2020 at 3:20 AM
wangyunjian<wangyunjian at huawei.com>
> > wrote:
> > >>>> From: Yunjian Wang<wangyunjian at huawei.com>
> > >>>>
> > >>>> Currently we break the loop and wake up the
vhost_worker when
> > >>>> sendmsg fails. When the worker wakes up again,
we'll meet the same
> > >>>> error.
> > >>> The patch is based on the assumption that such error
cases always
> > >>> return EAGAIN. Can it not also be ENOMEM, such as from
tun_build_skb?
> > >>>
> > >>>> This will cause high CPU load. To fix this issue, we
can skip this
> > >>>> description by ignoring the error. When we exceeds
sndbuf, the
> > >>>> return value of sendmsg is -EAGAIN. In the case we
don't skip the
> > >>>> description and don't drop packet.
> > >>> the -> that
> > >>>
> > >>> here and above: description -> descriptor
> > >>>
> > >>> Perhaps slightly revise to more explicitly state that
> > >>>
> > >>> 1. in the case of persistent failure (i.e., bad packet),
the driver
> > >>> drops the packet 2. in the case of transient failure
(e.g,. memory
> > >>> pressure) the driver schedules the worker to try again
later
> > >>
> > >> If we want to go with this way, we need a better time to
wakeup the
> > >> worker. Otherwise it just produces more stress on the cpu
that is
> > >> what this patch tries to avoid.
> > > Perhaps I misunderstood the purpose of the patch: is it to drop
> > > everything, regardless of transient or persistent failure, until
the
> > > ring runs out of descriptors?
> >
> >
> > My understanding is that the main motivation is to avoid high cpu
utilization
> > when sendmsg() fail due to guest reason (e.g bad packet).
> >
>
> My main motivation is to avoid the tx queue stuck.
>
> Should I describe it like this:
> Currently the driver don't drop a packet which can't be send by tun
> (e.g bad packet). In this case, the driver will always process the
> same packet lead to the tx queue stuck.
>
> To fix this issue:
> 1. in the case of persistent failure (e.g bad packet), the driver can skip
> this descriptior by ignoring the error.
> 2. in the case of transient failure (e.g -EAGAIN and -ENOMEM), the driver
> schedules the worker to try again.
That sounds good to me, thanks.
> Thanks
>
> >
> > >
> > > I can understand both a blocking and drop strategy during memory
> > > pressure. But partial drop strategy until exceeding ring capacity
> > > seems like a peculiar hybrid?
> >
> >
> > Yes. So I wonder if we want to be do better when we are in the memory
> > pressure. E.g can we let socket wake up us instead of rescheduling the
> > workers here? At least in this case we know some memory might be
freed?
I don't know whether a blocking or drop strategy is the better choice.
Either way, it probably deserves to be handled separately.

Virtualization - Dec 2020 - [PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails

[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails