Jason Wang
2021-Jun-07 02:28 UTC
[PATCH net-next] virtio_net: set link state down when virtqueue is broken
? 2021/6/5 ??3:10, wangyunjian ??:>> -----Original Message----- >> From: Jason Wang [mailto:jasowang at redhat.com] >> Sent: Friday, June 4, 2021 10:38 AM >> To: wangyunjian <wangyunjian at huawei.com>; netdev at vger.kernel.org >> Cc: kuba at kernel.org; davem at davemloft.net; mst at redhat.com; >> virtualization at lists.linux-foundation.org; dingxiaoxiong >> <dingxiaoxiong at huawei.com> >> Subject: Re: [PATCH net-next] virtio_net: set link state down when virtqueue is >> broken >> >> >> ? 2021/6/3 ??7:34, wangyunjian ??: >>>> -----Original Message----- >>>> From: Jason Wang [mailto:jasowang at redhat.com] >>>> Sent: Monday, May 31, 2021 11:29 AM >>>> To: wangyunjian <wangyunjian at huawei.com>; netdev at vger.kernel.org >>>> Cc: kuba at kernel.org; davem at davemloft.net; mst at redhat.com; >>>> virtualization at lists.linux-foundation.org; dingxiaoxiong >>>> <dingxiaoxiong at huawei.com> >>>> Subject: Re: [PATCH net-next] virtio_net: set link state down when >>>> virtqueue is broken >>>> >>>> >>>> ? 2021/5/28 ??6:58, wangyunjian ??: >>>>>> -----Original Message----- >>>>>>> From: Yunjian Wang <wangyunjian at huawei.com> >>>>>>> >>>>>>> The NIC can't receive/send packets if a rx/tx virtqueue is broken. >>>>>>> However, the link state of the NIC is still normal. As a result, >>>>>>> the user cannot detect the NIC exception. >>>>>> Doesn't we have: >>>>>> >>>>>> ?????? /* This should not happen! */ >>>>>> ??????? if (unlikely(err)) { >>>>>> ??????????????? dev->stats.tx_fifo_errors++; >>>>>> ??????????????? if (net_ratelimit()) >>>>>> ??????????????????????? dev_warn(&dev->dev, >>>>>> ???????????????????????????????? "Unexpected TXQ (%d) >>>> queue >>>>>> failure: %d\n", >>>>>> ???????????????????????????????? qnum, err); >>>>>> ??????????????? dev->stats.tx_dropped++; >>>>>> ??????????????? dev_kfree_skb_any(skb); >>>>>> ??????????????? return NETDEV_TX_OK; >>>>>> ??????? } >>>>>> >>>>>> Which should be sufficient? >>>>> There may be other reasons for this error, e.g -ENOSPC(no free desc). >>>> This should not happen unless the device or driver is buggy. We >>>> always reserved sufficient slots: >>>> >>>> ??????? if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { >>>> ??????????????? netif_stop_subqueue(dev, qnum); ... >>>> >>>> >>>>> And if rx virtqueue is broken, there is no error statistics. >>>> Feel free to add one if it's necessary. >>> Currently receiving scenario, it is impossible to distinguish whether >>> the reason for not receiving packet is virtqueue's broken or no packet. >> >> Can we introduce rx_fifo_errors for that? >> >> >>>> Let's leave the policy decision (link down) to userspace. >>>> >>>> >>>>>>> The driver can set the link state down when the virtqueue is broken. >>>>>>> If the state is down, the user can switch over to another NIC. >>>>>> Note that, we probably need the watchdog for virtio-net in order to >>>>>> be a complete solution. >>>>> Yes, I can think of is that the virtqueue's broken exception is >>>>> detected on >>>> watchdog. >>>>> Is there anything else that needs to be done? >>>> Basically, it's all about TX stall which watchdog tries to catch. >>>> Broken vq is only one of the possible reason. >>> Are there any plans for the watchdog? >> >> Somebody posted a prototype 3 or 4 years ago, you can search it and maybe we >> can start from there. > I find the patch (https://patchwork.ozlabs.org/project/netdev/patch/20191126200628.22251-3-jcfaracco at gmail.com/) > > The patch checks only the scenario where the sending queue is abnormal, but can > not detect the exception in the receiving queue.It's almost impossible to detect the abnormal of receiving queue since we there's no deterministic time for a receiving packet.> > And the patch restores the NIC by reset, which is inappropriate because the broken > state may be caused by a front-end or back-end bug. We should keep the scene to > locate bugs.This could be changed, we can increase the error counters and let userspce to decide what to do. Thanks> > Thanks > >> Thanks >> >> >>> Thanks >>> >>>> Thanks >>>> >>>> >>>>> Thanks >>>>> >>>>>> Thanks >>>>>> >>>>>>