Jason Wang
2021-Jun-04 02:37 UTC
[PATCH net-next] virtio_net: set link state down when virtqueue is broken
? 2021/6/3 ??7:34, wangyunjian ??:>> -----Original Message----- >> From: Jason Wang [mailto:jasowang at redhat.com] >> Sent: Monday, May 31, 2021 11:29 AM >> To: wangyunjian <wangyunjian at huawei.com>; netdev at vger.kernel.org >> Cc: kuba at kernel.org; davem at davemloft.net; mst at redhat.com; >> virtualization at lists.linux-foundation.org; dingxiaoxiong >> <dingxiaoxiong at huawei.com> >> Subject: Re: [PATCH net-next] virtio_net: set link state down when virtqueue is >> broken >> >> >> ? 2021/5/28 ??6:58, wangyunjian ??: >>>> -----Original Message----- >>>>> From: Yunjian Wang <wangyunjian at huawei.com> >>>>> >>>>> The NIC can't receive/send packets if a rx/tx virtqueue is broken. >>>>> However, the link state of the NIC is still normal. As a result, the >>>>> user cannot detect the NIC exception. >>>> Doesn't we have: >>>> >>>> ?????? /* This should not happen! */ >>>> ??????? if (unlikely(err)) { >>>> ??????????????? dev->stats.tx_fifo_errors++; >>>> ??????????????? if (net_ratelimit()) >>>> ??????????????????????? dev_warn(&dev->dev, >>>> ???????????????????????????????? "Unexpected TXQ (%d) >> queue >>>> failure: %d\n", >>>> ???????????????????????????????? qnum, err); >>>> ??????????????? dev->stats.tx_dropped++; >>>> ??????????????? dev_kfree_skb_any(skb); >>>> ??????????????? return NETDEV_TX_OK; >>>> ??????? } >>>> >>>> Which should be sufficient? >>> There may be other reasons for this error, e.g -ENOSPC(no free desc). >> >> This should not happen unless the device or driver is buggy. We always reserved >> sufficient slots: >> >> ??????? if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { >> ??????????????? netif_stop_subqueue(dev, qnum); ... >> >> >>> And if rx virtqueue is broken, there is no error statistics. >> >> Feel free to add one if it's necessary. > Currently receiving scenario, it is impossible to distinguish whether the reason for > not receiving packet is virtqueue's broken or no packet.Can we introduce rx_fifo_errors for that?> >> Let's leave the policy decision (link down) to userspace. >> >> >>>>> The driver can set the link state down when the virtqueue is broken. >>>>> If the state is down, the user can switch over to another NIC. >>>> Note that, we probably need the watchdog for virtio-net in order to >>>> be a complete solution. >>> Yes, I can think of is that the virtqueue's broken exception is detected on >> watchdog. >>> Is there anything else that needs to be done? >> >> Basically, it's all about TX stall which watchdog tries to catch. Broken vq is only >> one of the possible reason. > Are there any plans for the watchdog?Somebody posted a prototype 3 or 4 years ago, you can search it and maybe we can start from there. Thanks> > Thanks > >> Thanks >> >> >>> Thanks >>> >>>> Thanks >>>> >>>>