Jason Wang
2020-Dec-22 04:41 UTC
[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
On 2020/12/22 ??7:07, Willem de Bruijn wrote:> On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian at huawei.com> wrote: >> From: Yunjian Wang<wangyunjian at huawei.com> >> >> Currently we break the loop and wake up the vhost_worker when >> sendmsg fails. When the worker wakes up again, we'll meet the >> same error. > The patch is based on the assumption that such error cases always > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb? > >> This will cause high CPU load. To fix this issue, >> we can skip this description by ignoring the error. When we >> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In >> the case we don't skip the description and don't drop packet. > the -> that > > here and above: description -> descriptor > > Perhaps slightly revise to more explicitly state that > > 1. in the case of persistent failure (i.e., bad packet), the driver > drops the packet > 2. in the case of transient failure (e.g,. memory pressure) the driver > schedules the worker to try again laterIf we want to go with this way, we need a better time to wakeup the worker. Otherwise it just produces more stress on the cpu that is what this patch tries to avoid. Thanks> >
Willem de Bruijn
2020-Dec-22 14:24 UTC
[PATCH net v2 2/2] vhost_net: fix high cpu load when sendmsg fails
On Mon, Dec 21, 2020 at 11:41 PM Jason Wang <jasowang at redhat.com> wrote:> > > On 2020/12/22 ??7:07, Willem de Bruijn wrote: > > On Wed, Dec 16, 2020 at 3:20 AM wangyunjian<wangyunjian at huawei.com> wrote: > >> From: Yunjian Wang<wangyunjian at huawei.com> > >> > >> Currently we break the loop and wake up the vhost_worker when > >> sendmsg fails. When the worker wakes up again, we'll meet the > >> same error. > > The patch is based on the assumption that such error cases always > > return EAGAIN. Can it not also be ENOMEM, such as from tun_build_skb? > > > >> This will cause high CPU load. To fix this issue, > >> we can skip this description by ignoring the error. When we > >> exceeds sndbuf, the return value of sendmsg is -EAGAIN. In > >> the case we don't skip the description and don't drop packet. > > the -> that > > > > here and above: description -> descriptor > > > > Perhaps slightly revise to more explicitly state that > > > > 1. in the case of persistent failure (i.e., bad packet), the driver > > drops the packet > > 2. in the case of transient failure (e.g,. memory pressure) the driver > > schedules the worker to try again later > > > If we want to go with this way, we need a better time to wakeup the > worker. Otherwise it just produces more stress on the cpu that is what > this patch tries to avoid.Perhaps I misunderstood the purpose of the patch: is it to drop everything, regardless of transient or persistent failure, until the ring runs out of descriptors? I can understand both a blocking and drop strategy during memory pressure. But partial drop strategy until exceeding ring capacity seems like a peculiar hybrid?