Michael S. Tsirkin
2011-Jun-01 09:49 UTC
[PATCH RFC 0/3] virtio and vhost-net capacity handling
OK, here's a new attempt to use the new capacity api. I also added more comments to clarify the logic. Hope this is more readable. Let me know pls. This is on top of the patches applied by Rusty. Note: there are now actually 2 calls to fee_old_xmit_skbs on data path so instead of passing flag '2' to the second one I thought we can just make each call free up at least 1 skb. This will work and even might be a bit faster (less branches), but in the end I discarded this idea as too fragile (relies on two calls on data path to function properly). Warning: untested. Posting now to give people chance to comment on the API. Michael S. Tsirkin (3): virtio_ring: add capacity check API virtio_net: fix tx capacity checks using new API virtio_net: limit xmit polling drivers/net/virtio_net.c | 65 +++++++++++++++++++++++------------------ drivers/virtio/virtio_ring.c | 8 +++++ include/linux/virtio.h | 5 +++ 3 files changed, 49 insertions(+), 29 deletions(-) -- 1.7.5.53.gc233e
Michael S. Tsirkin
2011-Jun-01 09:49 UTC
[PATCH RFC 1/3] virtio_ring: add capacity check API
Add API to check ring capacity. Because of the option to use indirect buffers, this returns the worst case, not the normal case capacity. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/virtio/virtio_ring.c | 8 ++++++++ include/linux/virtio.h | 5 +++++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 68b9136..23422f1 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -344,6 +344,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) } EXPORT_SYMBOL_GPL(virtqueue_get_buf); +int virtqueue_min_capacity(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + return vq->num_free; +} +EXPORT_SYMBOL_GPL(virtqueue_min_capacity); + void virtqueue_disable_cb(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 7108857..209220d 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -42,6 +42,9 @@ struct virtqueue { * vq: the struct virtqueue we're talking about. * len: the length written into the buffer * Returns NULL or the "data" token handed to add_buf. + * virtqueue_min_capacity: get the current capacity of the queue + * vq: the struct virtqueue we're talking about. + * Returns the current worst case capacity of the queue. * virtqueue_disable_cb: disable callbacks * vq: the struct virtqueue we're talking about. * Note that this is not necessarily synchronous, hence unreliable and only @@ -89,6 +92,8 @@ void virtqueue_kick(struct virtqueue *vq); void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len); +int virtqueue_min_capacity(struct virtqueue *vq); + void virtqueue_disable_cb(struct virtqueue *vq); bool virtqueue_enable_cb(struct virtqueue *vq); -- 1.7.5.53.gc233e
Michael S. Tsirkin
2011-Jun-01 09:49 UTC
[PATCH RFC 2/3] virtio_net: fix tx capacity checks using new API
In the (rare) case where new descriptors are used while virtio_net enables vq callback for the TX vq, virtio_net uses the number of sg entries in the skb it frees to calculate how many descriptors in the ring have just been made available. But this value is an overestimate: with indirect buffers each skb only uses one descriptor entry, meaning we may wake the queue only to find we still can't transmit anything. Using the new virtqueue_min_capacity() call, we can determine the remaining capacity, so we should use that instead. This estimation is worst-case which is consistent with the value returned by virtqueue_add_buf. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/net/virtio_net.c | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index f685324..a0ee78d 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -509,19 +509,17 @@ again: return received; } -static unsigned int free_old_xmit_skbs(struct virtnet_info *vi) +static void free_old_xmit_skbs(struct virtnet_info *vi) { struct sk_buff *skb; - unsigned int len, tot_sgs = 0; + unsigned int len; while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { pr_debug("Sent skb %p\n", skb); vi->dev->stats.tx_bytes += skb->len; vi->dev->stats.tx_packets++; - tot_sgs += skb_vnet_hdr(skb)->num_sg; dev_kfree_skb_any(skb); } - return tot_sgs; } static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) @@ -611,7 +609,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) netif_stop_queue(dev); if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) { /* More just got used, free them then recheck. */ - capacity += free_old_xmit_skbs(vi); + free_old_xmit_skbs(vi); + capacity = virtqueue_min_capacity(vi->svq); if (capacity >= 2+MAX_SKB_FRAGS) { netif_start_queue(dev); virtqueue_disable_cb(vi->svq); -- 1.7.5.53.gc233e
Current code might introduce a lot of latency variation if there are many pending bufs at the time we attempt to transmit a new one. This is bad for real-time applications and can't be good for TCP either. Free up just enough to both clean up all buffers eventually and to be able to xmit the next packet. Signed-off-by: Michael S. Tsirkin <mst at redhat.com> --- drivers/net/virtio_net.c | 62 ++++++++++++++++++++++++++-------------------- 1 files changed, 35 insertions(+), 27 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index a0ee78d..6045510 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -509,17 +509,27 @@ again: return received; } -static void free_old_xmit_skbs(struct virtnet_info *vi) +/* Check capacity and try to free enough pending old buffers to enable queueing + * new ones. If min_skbs > 0, try to free at least the specified number of skbs + * even if the ring already has sufficient capacity. Return true if we can + * guarantee that a following virtqueue_add_buf will succeed. */ +static bool free_old_xmit_skbs(struct virtnet_info *vi, int min_skbs) { struct sk_buff *skb; unsigned int len; + bool r; - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { + while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) || + min_skbs-- > 0) { + skb = virtqueue_get_buf(vi->svq, &len); + if (unlikely(!skb)) + break; pr_debug("Sent skb %p\n", skb); vi->dev->stats.tx_bytes += skb->len; vi->dev->stats.tx_packets++; dev_kfree_skb_any(skb); } + return r; } static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) @@ -572,27 +582,24 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb) static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); - int capacity; + int ret, n; - /* Free up any pending old buffers before queueing new ones. */ - free_old_xmit_skbs(vi); + /* Free up space in the ring in case this is the first time we get + * woken up after ring full condition. Note: this might try to free + * more than strictly necessary if the skb has a small + * number of fragments, but keep it simple. */ + free_old_xmit_skbs(vi, 0); /* Try to transmit */ - capacity = xmit_skb(vi, skb); - - /* This can happen with OOM and indirect buffers. */ - if (unlikely(capacity < 0)) { - if (net_ratelimit()) { - if (likely(capacity == -ENOMEM)) { - dev_warn(&dev->dev, - "TX queue failure: out of memory\n"); - } else { - dev->stats.tx_fifo_errors++; - dev_warn(&dev->dev, - "Unexpected TX queue failure: %d\n", - capacity); - } - } + ret = xmit_skb(vi, skb); + + /* Failure to queue is unlikely. It's not a bug though: it might happen + * if we get an interrupt while the queue is still mostly full. + * We could stop the queue and re-enable callbacks (and possibly return + * TX_BUSY), but as this should be rare, we don't bother. */ + if (unlikely(ret < 0)) { + if (net_ratelimit()) + dev_info(&dev->dev, "TX queue failure: %d\n", ret); dev->stats.tx_dropped++; kfree_skb(skb); return NETDEV_TX_OK; @@ -603,15 +610,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) skb_orphan(skb); nf_reset(skb); - /* Apparently nice girls don't return TX_BUSY; stop the queue - * before it gets out of hand. Naturally, this wastes entries. */ - if (capacity < 2+MAX_SKB_FRAGS) { + /* Apparently nice girls don't return TX_BUSY; check capacity and stop + * the queue before it gets out of hand. + * Naturally, this wastes entries. */ + /* We transmit one skb, so try to free at least two pending skbs. + * This is so that we don't hog the skb memory unnecessarily. */ + if (!likely(free_old_xmit_skbs(vi, 2))) { netif_stop_queue(dev); if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) { - /* More just got used, free them then recheck. */ - free_old_xmit_skbs(vi); - capacity = virtqueue_min_capacity(vi->svq); - if (capacity >= 2+MAX_SKB_FRAGS) { + /* More just got used, free them and recheck. */ + if (!likely(free_old_xmit_skbs(vi, 0))) { netif_start_queue(dev); virtqueue_disable_cb(vi->svq); } -- 1.7.5.53.gc233e
On Wed, 1 Jun 2011 12:49:46 +0300, "Michael S. Tsirkin" <mst at redhat.com> wrote:> Add API to check ring capacity. Because of the option > to use indirect buffers, this returns the worst > case, not the normal case capacity.Can we drop the silly "add_buf() returns capacity" hack then? Thanks, Rusty.
Michael S. Tsirkin
2011-Jun-02 13:30 UTC
[PATCH RFC 1/3] virtio_ring: add capacity check API
On Thu, Jun 02, 2011 at 11:41:50AM +0930, Rusty Russell wrote:> On Wed, 1 Jun 2011 12:49:46 +0300, "Michael S. Tsirkin" <mst at redhat.com> wrote: > > Add API to check ring capacity. Because of the option > > to use indirect buffers, this returns the worst > > case, not the normal case capacity. > > Can we drop the silly "add_buf() returns capacity" hack then? > > Thanks, > Rusty.Sure.
Seemingly Similar Threads
- [PATCH RFC 0/3] virtio and vhost-net capacity handling
- [PATCHv2 RFC 0/4] virtio and vhost-net capacity handling
- [PATCHv2 RFC 0/4] virtio and vhost-net capacity handling
- [PATCH 00/18] virtio and vhost-net performance enhancements
- [PATCH 00/18] virtio and vhost-net performance enhancements