Michael S. Tsirkin
2021-May-26 08:24 UTC
[PATCH v3 4/4] virtio_net: disable cb aggressively
There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case, and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt. Fix up by disabling the callbacks before polling the tx vq. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/net/virtio_net.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c29f42d1e04f..a83dc038d8af 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) return; if (__netif_tx_trylock(txq)) { - free_old_xmit_skbs(sq, true); + do { + virtqueue_disable_cb(sq->vq); + free_old_xmit_skbs(sq, true); + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) netif_tx_wake_queue(txq); @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); bool kick = !netdev_xmit_more(); bool use_napi = sq->napi.weight; + unsigned int bytes = skb->len; /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(sq, false); + do { + if (use_napi) + virtqueue_disable_cb(sq->vq); - if (use_napi && kick) - virtqueue_enable_cb_delayed(sq->vq); + free_old_xmit_skbs(sq, false); + + } while (use_napi && kick && + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); /* timestamp packet in software */ skb_tx_timestamp(skb); -- MST
On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:> There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with interrupts disabled but that is no longer the > case, and in some cases the rate of spurious interrupts is so high > linux detects this and actually kills the interrupt. > > Fix up by disabling the callbacks before polling the tx vq.It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ? This seems not needed, adding costs to sender thread, this might reduce the ability to use a different cpu for tx completions. Also this will likely conflict with BQL model if we want to use BQL at some point.>This probably needs a Fixes: tag> Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > drivers/net/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c29f42d1e04f..a83dc038d8af 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > - free_old_xmit_skbs(sq, true); > + do { > + virtqueue_disable_cb(sq->vq); > + free_old_xmit_skbs(sq, true); > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; > > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq, false); > + do { > + if (use_napi) > + virtqueue_disable_cb(sq->vq); > > - if (use_napi && kick) > - virtqueue_enable_cb_delayed(sq->vq); > + free_old_xmit_skbs(sq, false); > + > + } while (use_napi && kick && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > /* timestamp packet in software */ > skb_tx_timestamp(skb); >
? 2021/5/26 ??4:24, Michael S. Tsirkin ??:> There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with interrupts disabled but that is no longer the > case, and in some cases the rate of spurious interrupts is so high > linux detects this and actually kills the interrupt. > > Fix up by disabling the callbacks before polling the tx vq. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > drivers/net/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c29f42d1e04f..a83dc038d8af 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > - free_old_xmit_skbs(sq, true); > + do { > + virtqueue_disable_cb(sq->vq); > + free_old_xmit_skbs(sq, true); > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; > > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq, false); > + do { > + if (use_napi) > + virtqueue_disable_cb(sq->vq); > > - if (use_napi && kick) > - virtqueue_enable_cb_delayed(sq->vq); > + free_old_xmit_skbs(sq, false); > + > + } while (use_napi && kick && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > /* timestamp packet in software */ > skb_tx_timestamp(skb);I wonder whehter we can simple disable cb during ndo_start_xmit(), or is there a way to make xmit and napi work in parallel? Thanks
Hi Michael, On 5/26/21 10:24, Michael S. Tsirkin wrote:> There are currently two cases where we poll TX vq not in response to a > callback: start xmit and rx napi. We currently do this with callbacks > enabled which can cause extra interrupts from the card. Used not to be > a big issue as we run with interrupts disabled but that is no longer the > case, and in some cases the rate of spurious interrupts is so high > linux detects this and actually kills the interrupt. > > Fix up by disabling the callbacks before polling the tx vq. > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > --- > drivers/net/virtio_net.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index c29f42d1e04f..a83dc038d8af 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) > return; > > if (__netif_tx_trylock(txq)) { > - free_old_xmit_skbs(sq, true); > + do { > + virtqueue_disable_cb(sq->vq); > + free_old_xmit_skbs(sq, true); > + } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) > netif_tx_wake_queue(txq); > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) > struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum); > bool kick = !netdev_xmit_more(); > bool use_napi = sq->napi.weight; > + unsigned int bytes = skb->len; > > /* Free up any pending old buffers before queueing new ones. */ > - free_old_xmit_skbs(sq, false); > + do { > + if (use_napi) > + virtqueue_disable_cb(sq->vq); > > - if (use_napi && kick) > - virtqueue_enable_cb_delayed(sq->vq); > + free_old_xmit_skbs(sq, false); > + > + } while (use_napi && kick && > + unlikely(!virtqueue_enable_cb_delayed(sq->vq))); > > /* timestamp packet in software */ > skb_tx_timestamp(skb);This patch seems to introduce a problem with QEMU connected to passt using netdev stream backend. When I run an iperf3 test I get after 1 or 2 seconds of test: [ 254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out ... [ 254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 8856000 usecs ago [ 259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 13951000 usecs ago In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed all the queue entries and re-enabled the queue notification with virtio_queue_set_notification() and tries to flush again the queue and as it is empty it does nothing more and then rely on a kernel notification to re-enable the bottom half function. As this notification never comes the queue is stuck and kernel add entries but QEMU doesn't remove them: 2812 static void virtio_net_tx_bh(void *opaque) 2813 { ... 2833 ret = virtio_net_flush_tx(q); -> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the function) ... 2850 virtio_queue_set_notification(q->tx_vq, 1); -> re-enable the queue notification 2851 ret = virtio_net_flush_tx(q); 2852 if (ret == -EINVAL) { 2853 return; 2854 } else if (ret > 0) { 2855 virtio_queue_set_notification(q->tx_vq, 0); 2856 qemu_bh_schedule(q->tx_bh); 2857 q->tx_waiting = 1; 2858 } -> ret is 0, exit the function without re-scheduling the function. ... 2859 } If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb aggressively")), it works fine. How to reproduce it: I start passt (https://passt.top/passt): passt -f and then QEMU qemu-system-x86_64 ... --netdev stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 Host side: sysctl -w net.core.rmem_max=134217728 sysctl -w net.core.wmem_max=134217728 iperf3 -s Guest side: sysctl -w net.core.rmem_max=536870912 sysctl -w net.core.wmem_max=536870912 ip link set dev $DEV mtu 256 iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M Any idea of what is the problem? Thanks, Laurent