Jason Wang
2023-Jun-08 00:38 UTC
[PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, Jun 8, 2023 at 4:17?AM Michael S. Tsirkin <mst at redhat.com> wrote:> > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > On Tue, May 30, 2023 at 9:19?AM Liang Chen <liangchen.linux at gmail.com> wrote: > > > > > > On Mon, May 29, 2023 at 5:55?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > On Sun, May 28, 2023 at 2:20?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > The implementation at the moment uses one page per packet in both the > > > > > > > normal and XDP path. In addition, introducing a module parameter to enable > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest performance gain > > > > > > > in the normal path. > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant performance > > > > > > > gain is observed in XDP cpumap: > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > With this foundation, we can further integrate page pool fragmentation and > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > Signed-off-by: Liang Chen <liangchen.linux at gmail.com> > > > > > > > > > > > > Why off by default? > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > packet size is very small, it reduces the likelihood of skb > > > > > coalescing. But such cases are rare. > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > Sure, Thanks! > > > > Before going ahead and posting v2 patch, I would like to hear more > > advice for the cases of small packets. I have done more performance > > benchmark with small packets since then. Here is a list of iperf > > output, > > > > With PP and PP fragmenting: > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec 0 144 KBytes > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec 0 > > 223 KBytes > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec 0 > > 324 KBytes > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec 0 > > 1.08 MBytes > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec 0 > > 744 KBytes > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec 0 963 KBytes > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec 0 1.25 MBytes > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec 0 1.70 MBytes > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec 0 4.26 MBytes > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec 0 3.20 MBytesNote that virtio-net driver is lacking things like BQL and others, so it might suffer from buffer bloat for TCP performance. Would you mind to measure with e.g using testpmd on the vhost to see the rx PPS?> > > > Without PP: > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec 0 359 KBytes > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec 0 730 KBytes > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec 0 1.99 MBytes > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec 0 1.20 MBytes > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec 0 1.72 MBytes > > 16K: [ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec 0 2.90 MBytes > > 32K: [ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec 0 3.03 MBytes > > 64K: [ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec 0 3.05 MBytes > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec 1 3.03 MBytes > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec 0 3.11 MBytes > > > > > > The major factor contributing to the performance drop is the reduction > > of skb coalescing. Additionally, without the page pool, small packets > > can still benefit from the allocation of 8 continuous pages by > > breaking them down into smaller pieces. This effectively reduces the > > frequency of page allocation from the buddy system. For instance, the > > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > > the benefits of using a page pool are limited in such cases.I wonder if we can improve page pool in this case anyhow.> In fact, > > without page pool fragmenting enabled, it can even hinder performance > > from this perspective. > > > > Upon further consideration, I tend to believe making page pool the > > default option may not be appropriate. As you pointed out, we cannot > > simply ignore the performance impact on small packets. Any comments on > > this will be much appreciated. > > > > > > Thanks, > > Liang > > > So, let's only use page pool for XDP then?+1 We can start from this. Thanks> > > > > > > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > > > > > > > --- > > > > > > > drivers/net/virtio_net.c | 188 ++++++++++++++++++++++++++++++--------- > > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > > module_param(gso, bool, 0444); > > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > > + > > > > > > > /* FIXME: MTU in config. */ > > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > > > > #define GOOD_COPY_LEN 128 > > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > > /* Chain pages by the private ptr. */ > > > > > > > struct page *pages; > > > > > > > > > > > > > > + /* Page pool */ > > > > > > > + struct page_pool *page_pool; > > > > > > > + > > > > > > > /* Average packet length for mergeable receive buffers. */ > > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, unsigned int buflen, > > > > > > > return skb; > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page) > > > > > > > +{ > > > > > > > + if (rq->page_pool) > > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > > + else > > > > > > > + put_page(page); > > > > > > > +} > > > > > > > + > > > > > > > /* Called from bottom half context */ > > > > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > struct receive_queue *rq, > > > > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > hdr = skb_vnet_hdr(skb); > > > > > > > memcpy(hdr, hdr_p, hdr_len); > > > > > > > if (page_to_free) > > > > > > > - put_page(page_to_free); > > > > > > > + virtnet_put_page(rq, page_to_free); > > > > > > > > > > > > > > return skb; > > > > > > > } > > > > > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, > > > > > > > return ret; > > > > > > > } > > > > > > > > > > > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq) > > > > > > > { > > > > > > > struct skb_shared_info *shinfo; > > > > > > > struct page *xdp_page; > > > > > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > shinfo = xdp_get_shared_info_from_buff(xdp); > > > > > > > for (i = 0; i < shinfo->nr_frags; i++) { > > > > > > > xdp_page = skb_frag_page(&shinfo->frags[i]); > > > > > > > - put_page(xdp_page); > > > > > > > + virtnet_put_page(rq, xdp_page); > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, > > > > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > > > > > > return NULL; > > > > > > > > > > > > > > - page = alloc_page(GFP_ATOMIC); > > > > > > > + if (rq->page_pool) > > > > > > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > + else > > > > > > > + page = alloc_page(GFP_ATOMIC); > > > > > > > + > > > > > > > if (!page) > > > > > > > return NULL; > > > > > > > > > > > > > > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, > > > > > > > * is sending packet larger than the MTU. > > > > > > > */ > > > > > > > if ((page_off + buflen + tailroom) > PAGE_SIZE) { > > > > > > > - put_page(p); > > > > > > > + virtnet_put_page(rq, p); > > > > > > > goto err_buf; > > > > > > > } > > > > > > > > > > > > > > memcpy(page_address(page) + page_off, > > > > > > > page_address(p) + off, buflen); > > > > > > > page_off += buflen; > > > > > > > - put_page(p); > > > > > > > + virtnet_put_page(rq, p); > > > > > > > } > > > > > > > > > > > > > > /* Headroom does not contribute to packet length */ > > > > > > > *len = page_off - VIRTIO_XDP_HEADROOM; > > > > > > > return page; > > > > > > > err_buf: > > > > > > > - __free_pages(page, 0); > > > > > > > + if (rq->page_pool) > > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > > + else > > > > > > > + __free_pages(page, 0); > > > > > > > return NULL; > > > > > > > } > > > > > > > > > > > > > > @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct receive_queue *rq, int num_buf, > > > > > > > } > > > > > > > stats->bytes += len; > > > > > > > page = virt_to_head_page(buf); > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > @@ -1264,7 +1285,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, > > > > > > > cur_frag_size = truesize; > > > > > > > xdp_frags_truesz += cur_frag_size; > > > > > > > if (unlikely(len > truesize - room || cur_frag_size > PAGE_SIZE)) { > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > pr_debug("%s: rx error: len %u exceeds truesize %lu\n", > > > > > > > dev->name, len, (unsigned long)(truesize - room)); > > > > > > > dev->stats.rx_length_errors++; > > > > > > > @@ -1283,7 +1304,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, > > > > > > > return 0; > > > > > > > > > > > > > > err: > > > > > > > - put_xdp_frags(xdp); > > > > > > > + put_xdp_frags(xdp, rq); > > > > > > > return -EINVAL; > > > > > > > } > > > > > > > > > > > > > > @@ -1344,7 +1365,10 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, > > > > > > > if (*len + xdp_room > PAGE_SIZE) > > > > > > > return NULL; > > > > > > > > > > > > > > - xdp_page = alloc_page(GFP_ATOMIC); > > > > > > > + if (rq->page_pool) > > > > > > > + xdp_page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > + else > > > > > > > + xdp_page = alloc_page(GFP_ATOMIC); > > > > > > > if (!xdp_page) > > > > > > > return NULL; > > > > > > > > > > > > > > @@ -1354,7 +1378,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, > > > > > > > > > > > > > > *frame_sz = PAGE_SIZE; > > > > > > > > > > > > > > - put_page(*page); > > > > > > > + virtnet_put_page(rq, *page); > > > > > > > > > > > > > > *page = xdp_page; > > > > > > > > > > > > > > @@ -1400,6 +1424,8 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev, > > > > > > > head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz); > > > > > > > if (unlikely(!head_skb)) > > > > > > > break; > > > > > > > + if (rq->page_pool) > > > > > > > + skb_mark_for_recycle(head_skb); > > > > > > > return head_skb; > > > > > > > > > > > > > > case XDP_TX: > > > > > > > @@ -1410,10 +1436,10 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev, > > > > > > > break; > > > > > > > } > > > > > > > > > > > > > > - put_xdp_frags(&xdp); > > > > > > > + put_xdp_frags(&xdp, rq); > > > > > > > > > > > > > > err_xdp: > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > stats->xdp_drops++; > > > > > > > @@ -1467,6 +1493,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > head_skb = page_to_skb(vi, rq, page, offset, len, truesize, headroom); > > > > > > > curr_skb = head_skb; > > > > > > > > > > > > > > + if (rq->page_pool) > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > + > > > > > > > if (unlikely(!curr_skb)) > > > > > > > goto err_skb; > > > > > > > while (--num_buf) { > > > > > > > @@ -1509,6 +1538,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > curr_skb = nskb; > > > > > > > head_skb->truesize += nskb->truesize; > > > > > > > num_skb_frags = 0; > > > > > > > + if (rq->page_pool) > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > } > > > > > > > if (curr_skb != head_skb) { > > > > > > > head_skb->data_len += len; > > > > > > > @@ -1517,7 +1548,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > } > > > > > > > offset = buf - page_address(page); > > > > > > > if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) { > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1, > > > > > > > len, truesize); > > > > > > > } else { > > > > > > > @@ -1530,7 +1561,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > return head_skb; > > > > > > > > > > > > > > err_skb: > > > > > > > - put_page(page); > > > > > > > + virtnet_put_page(rq, page); > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > err_buf: > > > > > > > @@ -1737,31 +1768,40 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > * disabled GSO for XDP, it won't be a big issue. > > > > > > > */ > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > - if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > - return -ENOMEM; > > > > > > > + if (rq->page_pool) { > > > > > > > + struct page *page; > > > > > > > > > > > > > > - buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > > > > > > - buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > - get_page(alloc_frag->page); > > > > > > > - alloc_frag->offset += len + room; > > > > > > > - hole = alloc_frag->size - alloc_frag->offset; > > > > > > > - if (hole < len + room) { > > > > > > > - /* To avoid internal fragmentation, if there is very likely not > > > > > > > - * enough space for another buffer, add the remaining space to > > > > > > > - * the current buffer. > > > > > > > - * XDP core assumes that frame_size of xdp_buff and the length > > > > > > > - * of the frag are PAGE_SIZE, so we disable the hole mechanism. > > > > > > > - */ > > > > > > > - if (!headroom) > > > > > > > - len += hole; > > > > > > > - alloc_frag->offset += hole; > > > > > > > - } > > > > > > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > + if (unlikely(!page)) > > > > > > > + return -ENOMEM; > > > > > > > + buf = (char *)page_address(page); > > > > > > > + buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > + } else { > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > + return -ENOMEM; > > > > > > > > > > > > > > + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > > > > > > + buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > + get_page(alloc_frag->page); > > > > > > > + alloc_frag->offset += len + room; > > > > > > > + hole = alloc_frag->size - alloc_frag->offset; > > > > > > > + if (hole < len + room) { > > > > > > > + /* To avoid internal fragmentation, if there is very likely not > > > > > > > + * enough space for another buffer, add the remaining space to > > > > > > > + * the current buffer. > > > > > > > + * XDP core assumes that frame_size of xdp_buff and the length > > > > > > > + * of the frag are PAGE_SIZE, so we disable the hole mechanism. > > > > > > > + */ > > > > > > > + if (!headroom) > > > > > > > + len += hole; > > > > > > > + alloc_frag->offset += hole; > > > > > > > + } > > > > > > > + } > > > > > > > sg_init_one(rq->sg, buf, len); > > > > > > > ctx = mergeable_len_to_ctx(len + room, headroom); > > > > > > > err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); > > > > > > > if (err < 0) > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > + virtnet_put_page(rq, virt_to_head_page(buf)); > > > > > > > > > > > > > > return err; > > > > > > > } > > > > > > > @@ -1994,8 +2034,15 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index) > > > > > > > if (err < 0) > > > > > > > return err; > > > > > > > > > > > > > > - err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > - MEM_TYPE_PAGE_SHARED, NULL); > > > > > > > + if (vi->rq[qp_index].page_pool) > > > > > > > + err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > + MEM_TYPE_PAGE_POOL, > > > > > > > + vi->rq[qp_index].page_pool); > > > > > > > + else > > > > > > > + err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > + MEM_TYPE_PAGE_SHARED, > > > > > > > + NULL); > > > > > > > + > > > > > > > if (err < 0) > > > > > > > goto err_xdp_reg_mem_model; > > > > > > > > > > > > > > @@ -2951,6 +2998,7 @@ static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *data) > > > > > > > ethtool_sprintf(&p, "tx_queue_%u_%s", i, > > > > > > > virtnet_sq_stats_desc[j].desc); > > > > > > > } > > > > > > > + page_pool_ethtool_stats_get_strings(p); > > > > > > > break; > > > > > > > } > > > > > > > } > > > > > > > @@ -2962,12 +3010,30 @@ static int virtnet_get_sset_count(struct net_device *dev, int sset) > > > > > > > switch (sset) { > > > > > > > case ETH_SS_STATS: > > > > > > > return vi->curr_queue_pairs * (VIRTNET_RQ_STATS_LEN + > > > > > > > - VIRTNET_SQ_STATS_LEN); > > > > > > > + VIRTNET_SQ_STATS_LEN + > > > > > > > + (page_pool_enabled && vi->mergeable_rx_bufs ? > > > > > > > + page_pool_ethtool_stats_get_count() : 0)); > > > > > > > default: > > > > > > > return -EOPNOTSUPP; > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_get_page_pool_stats(struct net_device *dev, u64 *data) > > > > > > > +{ > > > > > > > +#ifdef CONFIG_PAGE_POOL_STATS > > > > > > > + struct virtnet_info *vi = netdev_priv(dev); > > > > > > > + struct page_pool_stats pp_stats = {}; > > > > > > > + int i; > > > > > > > + > > > > > > > + for (i = 0; i < vi->curr_queue_pairs; i++) { > > > > > > > + if (!vi->rq[i].page_pool) > > > > > > > + continue; > > > > > > > + page_pool_get_stats(vi->rq[i].page_pool, &pp_stats); > > > > > > > + } > > > > > > > + page_pool_ethtool_stats_get(data, &pp_stats); > > > > > > > +#endif /* CONFIG_PAGE_POOL_STATS */ > > > > > > > +} > > > > > > > + > > > > > > > static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > > struct ethtool_stats *stats, u64 *data) > > > > > > > { > > > > > > > @@ -3003,6 +3069,8 @@ static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > > } while (u64_stats_fetch_retry(&sq->stats.syncp, start)); > > > > > > > idx += VIRTNET_SQ_STATS_LEN; > > > > > > > } > > > > > > > + > > > > > > > + virtnet_get_page_pool_stats(dev, &data[idx]); > > > > > > > } > > > > > > > > > > > > > > static void virtnet_get_channels(struct net_device *dev, > > > > > > > @@ -3623,6 +3691,8 @@ static void virtnet_free_queues(struct virtnet_info *vi) > > > > > > > for (i = 0; i < vi->max_queue_pairs; i++) { > > > > > > > __netif_napi_del(&vi->rq[i].napi); > > > > > > > __netif_napi_del(&vi->sq[i].napi); > > > > > > > + if (vi->rq[i].page_pool) > > > > > > > + page_pool_destroy(vi->rq[i].page_pool); > > > > > > > } > > > > > > > > > > > > > > /* We called __netif_napi_del(), > > > > > > > @@ -3679,12 +3749,19 @@ static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf) > > > > > > > struct virtnet_info *vi = vq->vdev->priv; > > > > > > > int i = vq2rxq(vq); > > > > > > > > > > > > > > - if (vi->mergeable_rx_bufs) > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > - else if (vi->big_packets) > > > > > > > + if (vi->mergeable_rx_bufs) { > > > > > > > + if (vi->rq[i].page_pool) { > > > > > > > + page_pool_put_full_page(vi->rq[i].page_pool, > > > > > > > + virt_to_head_page(buf), > > > > > > > + true); > > > > > > > + } else { > > > > > > > + put_page(virt_to_head_page(buf)); > > > > > > > + } > > > > > > > + } else if (vi->big_packets) { > > > > > > > give_pages(&vi->rq[i], buf); > > > > > > > - else > > > > > > > + } else { > > > > > > > put_page(virt_to_head_page(buf)); > > > > > > > + } > > > > > > > } > > > > > > > > > > > > > > static void free_unused_bufs(struct virtnet_info *vi) > > > > > > > @@ -3718,6 +3795,26 @@ static void virtnet_del_vqs(struct virtnet_info *vi) > > > > > > > virtnet_free_queues(vi); > > > > > > > } > > > > > > > > > > > > > > +static void virtnet_alloc_page_pool(struct receive_queue *rq) > > > > > > > +{ > > > > > > > + struct virtio_device *vdev = rq->vq->vdev; > > > > > > > + > > > > > > > + struct page_pool_params pp_params = { > > > > > > > + .order = 0, > > > > > > > + .pool_size = rq->vq->num_max, > > > > > > > + .nid = dev_to_node(vdev->dev.parent), > > > > > > > + .dev = vdev->dev.parent, > > > > > > > + .offset = 0, > > > > > > > + }; > > > > > > > + > > > > > > > + rq->page_pool = page_pool_create(&pp_params); > > > > > > > + if (IS_ERR(rq->page_pool)) { > > > > > > > + dev_warn(&vdev->dev, "page pool creation failed: %ld\n", > > > > > > > + PTR_ERR(rq->page_pool)); > > > > > > > + rq->page_pool = NULL; > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > /* How large should a single buffer be so a queue full of these can fit at > > > > > > > * least one full packet? > > > > > > > * Logic below assumes the mergeable buffer header is used. > > > > > > > @@ -3801,6 +3898,13 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > > > > vi->rq[i].vq = vqs[rxq2vq(i)]; > > > > > > > vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); > > > > > > > vi->sq[i].vq = vqs[txq2vq(i)]; > > > > > > > + > > > > > > > + if (page_pool_enabled && vi->mergeable_rx_bufs) > > > > > > > + virtnet_alloc_page_pool(&vi->rq[i]); > > > > > > > + else > > > > > > > + dev_warn(&vi->vdev->dev, > > > > > > > + "page pool only support mergeable mode\n"); > > > > > > > + > > > > > > > } > > > > > > > > > > > > > > /* run here: ret == 0. */ > > > > > > > -- > > > > > > > 2.31.1 > > > > > > > > > > >
Xuan Zhuo
2023-Jun-08 03:54 UTC
[PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, 8 Jun 2023 08:38:14 +0800, Jason Wang <jasowang at redhat.com> wrote:> On Thu, Jun 8, 2023 at 4:17?AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > > On Tue, May 30, 2023 at 9:19?AM Liang Chen <liangchen.linux at gmail.com> wrote: > > > > > > > > On Mon, May 29, 2023 at 5:55?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > > On Sun, May 28, 2023 at 2:20?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > > The implementation at the moment uses one page per packet in both the > > > > > > > > normal and XDP path. In addition, introducing a module parameter to enable > > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest performance gain > > > > > > > > in the normal path. > > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant performance > > > > > > > > gain is observed in XDP cpumap: > > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > > > With this foundation, we can further integrate page pool fragmentation and > > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > > > Signed-off-by: Liang Chen <liangchen.linux at gmail.com> > > > > > > > > > > > > > > Why off by default? > > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > > packet size is very small, it reduces the likelihood of skb > > > > > > coalescing. But such cases are rare. > > > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > > > > Sure, Thanks! > > > > > > Before going ahead and posting v2 patch, I would like to hear more > > > advice for the cases of small packets. I have done more performance > > > benchmark with small packets since then. Here is a list of iperf > > > output, > > > > > > With PP and PP fragmenting: > > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec 0 144 KBytes > > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec 0 > > > 223 KBytes > > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec 0 > > > 324 KBytes > > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec 0 > > > 1.08 MBytes > > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec 0 > > > 744 KBytes > > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec 0 963 KBytes > > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec 0 1.25 MBytes > > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec 0 1.70 MBytes > > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec 0 4.26 MBytes > > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec 0 3.20 MBytes > > Note that virtio-net driver is lacking things like BQL and others, so > it might suffer from buffer bloat for TCP performance. Would you mind > to measure with e.g using testpmd on the vhost to see the rx PPS? > > > > > > > Without PP: > > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec 0 359 KBytes > > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec 0 730 KBytes > > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec 0 1.99 MBytes > > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec 0 1.20 MBytes > > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec 0 1.72 MBytes > > > 16K: [ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec 0 2.90 MBytes > > > 32K: [ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec 0 3.03 MBytes > > > 64K: [ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec 0 3.05 MBytes > > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec 1 3.03 MBytes > > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec 0 3.11 MBytes > > > > > > > > > The major factor contributing to the performance drop is the reduction > > > of skb coalescing. Additionally, without the page pool, small packets > > > can still benefit from the allocation of 8 continuous pages by > > > breaking them down into smaller pieces. This effectively reduces the > > > frequency of page allocation from the buddy system. For instance, the > > > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > > > the benefits of using a page pool are limited in such cases. > > I wonder if we can improve page pool in this case anyhow. > > > In fact, > > > without page pool fragmenting enabled, it can even hinder performance > > > from this perspective. > > > > > > Upon further consideration, I tend to believe making page pool the > > > default option may not be appropriate. As you pointed out, we cannot > > > simply ignore the performance impact on small packets. Any comments on > > > this will be much appreciated. > > > > > > > > > Thanks, > > > Liang > > > > > > So, let's only use page pool for XDP then? > > +1+1 Thanks.> > We can start from this. > > Thanks > > > > > > > > > > > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > > > > > > > > > --- > > > > > > > > drivers/net/virtio_net.c | 188 ++++++++++++++++++++++++++++++--------- > > > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > > > module_param(gso, bool, 0444); > > > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > > > + > > > > > > > > /* FIXME: MTU in config. */ > > > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > > > > > #define GOOD_COPY_LEN 128 > > > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > > > /* Chain pages by the private ptr. */ > > > > > > > > struct page *pages; > > > > > > > > > > > > > > > > + /* Page pool */ > > > > > > > > + struct page_pool *page_pool; > > > > > > > > + > > > > > > > > /* Average packet length for mergeable receive buffers. */ > > > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, unsigned int buflen, > > > > > > > > return skb; > > > > > > > > } > > > > > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page) > > > > > > > > +{ > > > > > > > > + if (rq->page_pool) > > > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > > > + else > > > > > > > > + put_page(page); > > > > > > > > +} > > > > > > > > + > > > > > > > > /* Called from bottom half context */ > > > > > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > > struct receive_queue *rq, > > > > > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > > hdr = skb_vnet_hdr(skb); > > > > > > > > memcpy(hdr, hdr_p, hdr_len); > > > > > > > > if (page_to_free) > > > > > > > > - put_page(page_to_free); > > > > > > > > + virtnet_put_page(rq, page_to_free); > > > > > > > > > > > > > > > > return skb; > > > > > > > > } > > > > > > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, > > > > > > > > return ret; > > > > > > > > } > > > > > > > > > > > > > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq) > > > > > > > > { > > > > > > > > struct skb_shared_info *shinfo; > > > > > > > > struct page *xdp_page; > > > > > > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > > shinfo = xdp_get_shared_info_from_buff(xdp); > > > > > > > > for (i = 0; i < shinfo->nr_frags; i++) { > > > > > > > > xdp_page = skb_frag_page(&shinfo->frags[i]); > > > > > > > > - put_page(xdp_page); > > > > > > > > + virtnet_put_page(rq, xdp_page); > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, > > > > > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > - page = alloc_page(GFP_ATOMIC); > > > > > > > > + if (rq->page_pool) > > > > > > > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > > + else > > > > > > > > + page = alloc_page(GFP_ATOMIC); > > > > > > > > + > > > > > > > > if (!page) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, > > > > > > > > * is sending packet larger than the MTU. > > > > > > > > */ > > > > > > > > if ((page_off + buflen + tailroom) > PAGE_SIZE) { > > > > > > > > - put_page(p); > > > > > > > > + virtnet_put_page(rq, p); > > > > > > > > goto err_buf; > > > > > > > > } > > > > > > > > > > > > > > > > memcpy(page_address(page) + page_off, > > > > > > > > page_address(p) + off, buflen); > > > > > > > > page_off += buflen; > > > > > > > > - put_page(p); > > > > > > > > + virtnet_put_page(rq, p); > > > > > > > > } > > > > > > > > > > > > > > > > /* Headroom does not contribute to packet length */ > > > > > > > > *len = page_off - VIRTIO_XDP_HEADROOM; > > > > > > > > return page; > > > > > > > > err_buf: > > > > > > > > - __free_pages(page, 0); > > > > > > > > + if (rq->page_pool) > > > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > > > + else > > > > > > > > + __free_pages(page, 0); > > > > > > > > return NULL; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct receive_queue *rq, int num_buf, > > > > > > > > } > > > > > > > > stats->bytes += len; > > > > > > > > page = virt_to_head_page(buf); > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > @@ -1264,7 +1285,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, > > > > > > > > cur_frag_size = truesize; > > > > > > > > xdp_frags_truesz += cur_frag_size; > > > > > > > > if (unlikely(len > truesize - room || cur_frag_size > PAGE_SIZE)) { > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > pr_debug("%s: rx error: len %u exceeds truesize %lu\n", > > > > > > > > dev->name, len, (unsigned long)(truesize - room)); > > > > > > > > dev->stats.rx_length_errors++; > > > > > > > > @@ -1283,7 +1304,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, > > > > > > > > return 0; > > > > > > > > > > > > > > > > err: > > > > > > > > - put_xdp_frags(xdp); > > > > > > > > + put_xdp_frags(xdp, rq); > > > > > > > > return -EINVAL; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -1344,7 +1365,10 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, > > > > > > > > if (*len + xdp_room > PAGE_SIZE) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > - xdp_page = alloc_page(GFP_ATOMIC); > > > > > > > > + if (rq->page_pool) > > > > > > > > + xdp_page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > > + else > > > > > > > > + xdp_page = alloc_page(GFP_ATOMIC); > > > > > > > > if (!xdp_page) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > @@ -1354,7 +1378,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, > > > > > > > > > > > > > > > > *frame_sz = PAGE_SIZE; > > > > > > > > > > > > > > > > - put_page(*page); > > > > > > > > + virtnet_put_page(rq, *page); > > > > > > > > > > > > > > > > *page = xdp_page; > > > > > > > > > > > > > > > > @@ -1400,6 +1424,8 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev, > > > > > > > > head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz); > > > > > > > > if (unlikely(!head_skb)) > > > > > > > > break; > > > > > > > > + if (rq->page_pool) > > > > > > > > + skb_mark_for_recycle(head_skb); > > > > > > > > return head_skb; > > > > > > > > > > > > > > > > case XDP_TX: > > > > > > > > @@ -1410,10 +1436,10 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev, > > > > > > > > break; > > > > > > > > } > > > > > > > > > > > > > > > > - put_xdp_frags(&xdp); > > > > > > > > + put_xdp_frags(&xdp, rq); > > > > > > > > > > > > > > > > err_xdp: > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > > > stats->xdp_drops++; > > > > > > > > @@ -1467,6 +1493,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > head_skb = page_to_skb(vi, rq, page, offset, len, truesize, headroom); > > > > > > > > curr_skb = head_skb; > > > > > > > > > > > > > > > > + if (rq->page_pool) > > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > > + > > > > > > > > if (unlikely(!curr_skb)) > > > > > > > > goto err_skb; > > > > > > > > while (--num_buf) { > > > > > > > > @@ -1509,6 +1538,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > curr_skb = nskb; > > > > > > > > head_skb->truesize += nskb->truesize; > > > > > > > > num_skb_frags = 0; > > > > > > > > + if (rq->page_pool) > > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > > } > > > > > > > > if (curr_skb != head_skb) { > > > > > > > > head_skb->data_len += len; > > > > > > > > @@ -1517,7 +1548,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > } > > > > > > > > offset = buf - page_address(page); > > > > > > > > if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) { > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1, > > > > > > > > len, truesize); > > > > > > > > } else { > > > > > > > > @@ -1530,7 +1561,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > return head_skb; > > > > > > > > > > > > > > > > err_skb: > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > > > err_buf: > > > > > > > > @@ -1737,31 +1768,40 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > > * disabled GSO for XDP, it won't be a big issue. > > > > > > > > */ > > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > - if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > > - return -ENOMEM; > > > > > > > > + if (rq->page_pool) { > > > > > > > > + struct page *page; > > > > > > > > > > > > > > > > - buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > > > > > > > - buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > > - get_page(alloc_frag->page); > > > > > > > > - alloc_frag->offset += len + room; > > > > > > > > - hole = alloc_frag->size - alloc_frag->offset; > > > > > > > > - if (hole < len + room) { > > > > > > > > - /* To avoid internal fragmentation, if there is very likely not > > > > > > > > - * enough space for another buffer, add the remaining space to > > > > > > > > - * the current buffer. > > > > > > > > - * XDP core assumes that frame_size of xdp_buff and the length > > > > > > > > - * of the frag are PAGE_SIZE, so we disable the hole mechanism. > > > > > > > > - */ > > > > > > > > - if (!headroom) > > > > > > > > - len += hole; > > > > > > > > - alloc_frag->offset += hole; > > > > > > > > - } > > > > > > > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > > + if (unlikely(!page)) > > > > > > > > + return -ENOMEM; > > > > > > > > + buf = (char *)page_address(page); > > > > > > > > + buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > > + } else { > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > > + return -ENOMEM; > > > > > > > > > > > > > > > > + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > > > > > > > + buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > > + get_page(alloc_frag->page); > > > > > > > > + alloc_frag->offset += len + room; > > > > > > > > + hole = alloc_frag->size - alloc_frag->offset; > > > > > > > > + if (hole < len + room) { > > > > > > > > + /* To avoid internal fragmentation, if there is very likely not > > > > > > > > + * enough space for another buffer, add the remaining space to > > > > > > > > + * the current buffer. > > > > > > > > + * XDP core assumes that frame_size of xdp_buff and the length > > > > > > > > + * of the frag are PAGE_SIZE, so we disable the hole mechanism. > > > > > > > > + */ > > > > > > > > + if (!headroom) > > > > > > > > + len += hole; > > > > > > > > + alloc_frag->offset += hole; > > > > > > > > + } > > > > > > > > + } > > > > > > > > sg_init_one(rq->sg, buf, len); > > > > > > > > ctx = mergeable_len_to_ctx(len + room, headroom); > > > > > > > > err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); > > > > > > > > if (err < 0) > > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > > + virtnet_put_page(rq, virt_to_head_page(buf)); > > > > > > > > > > > > > > > > return err; > > > > > > > > } > > > > > > > > @@ -1994,8 +2034,15 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index) > > > > > > > > if (err < 0) > > > > > > > > return err; > > > > > > > > > > > > > > > > - err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > > - MEM_TYPE_PAGE_SHARED, NULL); > > > > > > > > + if (vi->rq[qp_index].page_pool) > > > > > > > > + err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > > + MEM_TYPE_PAGE_POOL, > > > > > > > > + vi->rq[qp_index].page_pool); > > > > > > > > + else > > > > > > > > + err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > > + MEM_TYPE_PAGE_SHARED, > > > > > > > > + NULL); > > > > > > > > + > > > > > > > > if (err < 0) > > > > > > > > goto err_xdp_reg_mem_model; > > > > > > > > > > > > > > > > @@ -2951,6 +2998,7 @@ static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *data) > > > > > > > > ethtool_sprintf(&p, "tx_queue_%u_%s", i, > > > > > > > > virtnet_sq_stats_desc[j].desc); > > > > > > > > } > > > > > > > > + page_pool_ethtool_stats_get_strings(p); > > > > > > > > break; > > > > > > > > } > > > > > > > > } > > > > > > > > @@ -2962,12 +3010,30 @@ static int virtnet_get_sset_count(struct net_device *dev, int sset) > > > > > > > > switch (sset) { > > > > > > > > case ETH_SS_STATS: > > > > > > > > return vi->curr_queue_pairs * (VIRTNET_RQ_STATS_LEN + > > > > > > > > - VIRTNET_SQ_STATS_LEN); > > > > > > > > + VIRTNET_SQ_STATS_LEN + > > > > > > > > + (page_pool_enabled && vi->mergeable_rx_bufs ? > > > > > > > > + page_pool_ethtool_stats_get_count() : 0)); > > > > > > > > default: > > > > > > > > return -EOPNOTSUPP; > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > +static void virtnet_get_page_pool_stats(struct net_device *dev, u64 *data) > > > > > > > > +{ > > > > > > > > +#ifdef CONFIG_PAGE_POOL_STATS > > > > > > > > + struct virtnet_info *vi = netdev_priv(dev); > > > > > > > > + struct page_pool_stats pp_stats = {}; > > > > > > > > + int i; > > > > > > > > + > > > > > > > > + for (i = 0; i < vi->curr_queue_pairs; i++) { > > > > > > > > + if (!vi->rq[i].page_pool) > > > > > > > > + continue; > > > > > > > > + page_pool_get_stats(vi->rq[i].page_pool, &pp_stats); > > > > > > > > + } > > > > > > > > + page_pool_ethtool_stats_get(data, &pp_stats); > > > > > > > > +#endif /* CONFIG_PAGE_POOL_STATS */ > > > > > > > > +} > > > > > > > > + > > > > > > > > static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > > > struct ethtool_stats *stats, u64 *data) > > > > > > > > { > > > > > > > > @@ -3003,6 +3069,8 @@ static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > > > } while (u64_stats_fetch_retry(&sq->stats.syncp, start)); > > > > > > > > idx += VIRTNET_SQ_STATS_LEN; > > > > > > > > } > > > > > > > > + > > > > > > > > + virtnet_get_page_pool_stats(dev, &data[idx]); > > > > > > > > } > > > > > > > > > > > > > > > > static void virtnet_get_channels(struct net_device *dev, > > > > > > > > @@ -3623,6 +3691,8 @@ static void virtnet_free_queues(struct virtnet_info *vi) > > > > > > > > for (i = 0; i < vi->max_queue_pairs; i++) { > > > > > > > > __netif_napi_del(&vi->rq[i].napi); > > > > > > > > __netif_napi_del(&vi->sq[i].napi); > > > > > > > > + if (vi->rq[i].page_pool) > > > > > > > > + page_pool_destroy(vi->rq[i].page_pool); > > > > > > > > } > > > > > > > > > > > > > > > > /* We called __netif_napi_del(), > > > > > > > > @@ -3679,12 +3749,19 @@ static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf) > > > > > > > > struct virtnet_info *vi = vq->vdev->priv; > > > > > > > > int i = vq2rxq(vq); > > > > > > > > > > > > > > > > - if (vi->mergeable_rx_bufs) > > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > > - else if (vi->big_packets) > > > > > > > > + if (vi->mergeable_rx_bufs) { > > > > > > > > + if (vi->rq[i].page_pool) { > > > > > > > > + page_pool_put_full_page(vi->rq[i].page_pool, > > > > > > > > + virt_to_head_page(buf), > > > > > > > > + true); > > > > > > > > + } else { > > > > > > > > + put_page(virt_to_head_page(buf)); > > > > > > > > + } > > > > > > > > + } else if (vi->big_packets) { > > > > > > > > give_pages(&vi->rq[i], buf); > > > > > > > > - else > > > > > > > > + } else { > > > > > > > > put_page(virt_to_head_page(buf)); > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > > > static void free_unused_bufs(struct virtnet_info *vi) > > > > > > > > @@ -3718,6 +3795,26 @@ static void virtnet_del_vqs(struct virtnet_info *vi) > > > > > > > > virtnet_free_queues(vi); > > > > > > > > } > > > > > > > > > > > > > > > > +static void virtnet_alloc_page_pool(struct receive_queue *rq) > > > > > > > > +{ > > > > > > > > + struct virtio_device *vdev = rq->vq->vdev; > > > > > > > > + > > > > > > > > + struct page_pool_params pp_params = { > > > > > > > > + .order = 0, > > > > > > > > + .pool_size = rq->vq->num_max, > > > > > > > > + .nid = dev_to_node(vdev->dev.parent), > > > > > > > > + .dev = vdev->dev.parent, > > > > > > > > + .offset = 0, > > > > > > > > + }; > > > > > > > > + > > > > > > > > + rq->page_pool = page_pool_create(&pp_params); > > > > > > > > + if (IS_ERR(rq->page_pool)) { > > > > > > > > + dev_warn(&vdev->dev, "page pool creation failed: %ld\n", > > > > > > > > + PTR_ERR(rq->page_pool)); > > > > > > > > + rq->page_pool = NULL; > > > > > > > > + } > > > > > > > > +} > > > > > > > > + > > > > > > > > /* How large should a single buffer be so a queue full of these can fit at > > > > > > > > * least one full packet? > > > > > > > > * Logic below assumes the mergeable buffer header is used. > > > > > > > > @@ -3801,6 +3898,13 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > > > > > vi->rq[i].vq = vqs[rxq2vq(i)]; > > > > > > > > vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); > > > > > > > > vi->sq[i].vq = vqs[txq2vq(i)]; > > > > > > > > + > > > > > > > > + if (page_pool_enabled && vi->mergeable_rx_bufs) > > > > > > > > + virtnet_alloc_page_pool(&vi->rq[i]); > > > > > > > > + else > > > > > > > > + dev_warn(&vi->vdev->dev, > > > > > > > > + "page pool only support mergeable mode\n"); > > > > > > > > + > > > > > > > > } > > > > > > > > > > > > > > > > /* run here: ret == 0. */ > > > > > > > > -- > > > > > > > > 2.31.1 > > > > > > > > > > > > > > >
Liang Chen
2023-Jun-09 02:57 UTC
[PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance
On Thu, Jun 8, 2023 at 8:38?AM Jason Wang <jasowang at redhat.com> wrote:> > On Thu, Jun 8, 2023 at 4:17?AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote: > > > On Tue, May 30, 2023 at 9:19?AM Liang Chen <liangchen.linux at gmail.com> wrote: > > > > > > > > On Mon, May 29, 2023 at 5:55?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote: > > > > > > On Sun, May 28, 2023 at 2:20?PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote: > > > > > > > > The implementation at the moment uses one page per packet in both the > > > > > > > > normal and XDP path. In addition, introducing a module parameter to enable > > > > > > > > or disable the usage of page pool (disabled by default). > > > > > > > > > > > > > > > > In single-core vm testing environments, it gives a modest performance gain > > > > > > > > in the normal path. > > > > > > > > Upstream codebase: 47.5 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 50.2 Gbits/sec > > > > > > > > > > > > > > > > In multi-core vm testing environments, The most significant performance > > > > > > > > gain is observed in XDP cpumap: > > > > > > > > Upstream codebase: 1.38 Gbits/sec > > > > > > > > Upstream codebase + page_pool support: 9.74 Gbits/sec > > > > > > > > > > > > > > > > With this foundation, we can further integrate page pool fragmentation and > > > > > > > > DMA map/unmap support. > > > > > > > > > > > > > > > > Signed-off-by: Liang Chen <liangchen.linux at gmail.com> > > > > > > > > > > > > > > Why off by default? > > > > > > > I am guessing it sometimes has performance costs too? > > > > > > > > > > > > > > > > > > > > > What happens if we use page pool for big mode too? > > > > > > > The less modes we have the better... > > > > > > > > > > > > > > > > > > > > > > > > > > Sure, now I believe it makes sense to enable it by default. When the > > > > > > packet size is very small, it reduces the likelihood of skb > > > > > > coalescing. But such cases are rare. > > > > > > > > > > small packets are rare? These workloads are easy to create actually. > > > > > Pls try and include benchmark with small packet size. > > > > > > > > > > > > > Sure, Thanks! > > > > > > Before going ahead and posting v2 patch, I would like to hear more > > > advice for the cases of small packets. I have done more performance > > > benchmark with small packets since then. Here is a list of iperf > > > output, > > > > > > With PP and PP fragmenting: > > > 256K: [ 5] 505.00-510.00 sec 1.34 GBytes 2.31 Gbits/sec 0 144 KBytes > > > 1K: [ 5] 30.00-35.00 sec 4.63 GBytes 7.95 Gbits/sec 0 > > > 223 KBytes > > > 2K: [ 5] 65.00-70.00 sec 8.33 GBytes 14.3 Gbits/sec 0 > > > 324 KBytes > > > 4K: [ 5] 30.00-35.00 sec 13.3 GBytes 22.8 Gbits/sec 0 > > > 1.08 MBytes > > > 8K: [ 5] 50.00-55.00 sec 18.9 GBytes 32.4 Gbits/sec 0 > > > 744 KBytes > > > 16K: [ 5] 25.00-30.00 sec 24.6 GBytes 42.3 Gbits/sec 0 963 KBytes > > > 32K: [ 5] 45.00-50.00 sec 29.8 GBytes 51.2 Gbits/sec 0 1.25 MBytes > > > 64K: [ 5] 35.00-40.00 sec 34.0 GBytes 58.4 Gbits/sec 0 1.70 MBytes > > > 128K: [ 5] 45.00-50.00 sec 36.7 GBytes 63.1 Gbits/sec 0 4.26 MBytes > > > 256K: [ 5] 30.00-35.00 sec 40.0 GBytes 68.8 Gbits/sec 0 3.20 MBytes > > Note that virtio-net driver is lacking things like BQL and others, so > it might suffer from buffer bloat for TCP performance. Would you mind > to measure with e.g using testpmd on the vhost to see the rx PPS? >No problem. Before we proceed to measure with testpmd, could you please take a look at the PPS measurements we obtained previously and see if they are sufficient? Though we will only utilize page pool for xdp on v2. netperf -H 192.168.124.197 -p 4444 -t UDP_STREAM -l 0 -- -m $((1)) with page pool: 1. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 655092.27 0.35 27508.77 0.03 0.00 0.00 0.00 0.00 2. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 654749.87 0.63 27494.42 0.05 0.00 0.00 0.00 0.00 3. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 654230.40 0.10 27472.57 0.01 0.00 0.00 0.00 0.00 4. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 656661.33 0.15 27574.65 0.01 0.00 0.00 0.00 0.00 without page pool: 1. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 646515.20 0.47 27148.60 0.04 0.00 0.00 0.00 0.00 2. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 653874.13 0.18 27457.61 0.02 0.00 0.00 0.00 0.00 3. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 647246.93 0.15 27179.32 0.01 0.00 0.00 0.00 0.00 4. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: enp8s0 650625.07 0.27 27321.18 0.02 0.00 0.00 0.00 0.00 (655092+654749+654230+656661)/(646515+653874+647246+650625) 1.00864886500966031113 On average it gives around 0.8% increase in PPS, and this figure can be reproduced consistently.> > > > > > Without PP: > > > 256: [ 5] 680.00-685.00 sec 1.57 GBytes 2.69 Gbits/sec 0 359 KBytes > > > 1K: [ 5] 75.00-80.00 sec 5.47 GBytes 9.40 Gbits/sec 0 730 KBytes > > > 2K: [ 5] 65.00-70.00 sec 9.46 GBytes 16.2 Gbits/sec 0 1.99 MBytes > > > 4K: [ 5] 30.00-35.00 sec 14.5 GBytes 25.0 Gbits/sec 0 1.20 MBytes > > > 8K: [ 5] 45.00-50.00 sec 19.9 GBytes 34.1 Gbits/sec 0 1.72 MBytes > > > 16K: [ 5] 5.00-10.00 sec 23.8 GBytes 40.9 Gbits/sec 0 2.90 MBytes > > > 32K: [ 5] 15.00-20.00 sec 28.0 GBytes 48.1 Gbits/sec 0 3.03 MBytes > > > 64K: [ 5] 60.00-65.00 sec 31.8 GBytes 54.6 Gbits/sec 0 3.05 MBytes > > > 128K: [ 5] 45.00-50.00 sec 33.0 GBytes 56.6 Gbits/sec 1 3.03 MBytes > > > 256K: [ 5] 25.00-30.00 sec 34.7 GBytes 59.6 Gbits/sec 0 3.11 MBytes > > > > > > > > > The major factor contributing to the performance drop is the reduction > > > of skb coalescing. Additionally, without the page pool, small packets > > > can still benefit from the allocation of 8 continuous pages by > > > breaking them down into smaller pieces. This effectively reduces the > > > frequency of page allocation from the buddy system. For instance, the > > > arrival of 32 1K packets only triggers one alloc_page call. Therefore, > > > the benefits of using a page pool are limited in such cases. > > I wonder if we can improve page pool in this case anyhow. >We would like to make the effort to enhance skb coalecsing to be more friendly with page pool buffers. But that involves modifications to some core data structure of mm.> > In fact, > > > without page pool fragmenting enabled, it can even hinder performance > > > from this perspective. > > > > > > Upon further consideration, I tend to believe making page pool the > > > default option may not be appropriate. As you pointed out, we cannot > > > simply ignore the performance impact on small packets. Any comments on > > > this will be much appreciated. > > > > > > > > > Thanks, > > > Liang > > > > > > So, let's only use page pool for XDP then? > > +1 > > We can start from this. > > Thanks > > > > > > > > > > > > The usage of page pool for big mode is being evaluated now. Thanks! > > > > > > > > > > > > > > --- > > > > > > > > drivers/net/virtio_net.c | 188 ++++++++++++++++++++++++++++++--------- > > > > > > > > 1 file changed, 146 insertions(+), 42 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > > > > index c5dca0d92e64..99c0ca0c1781 100644 > > > > > > > > --- a/drivers/net/virtio_net.c > > > > > > > > +++ b/drivers/net/virtio_net.c > > > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444); > > > > > > > > module_param(gso, bool, 0444); > > > > > > > > module_param(napi_tx, bool, 0644); > > > > > > > > > > > > > > > > +static bool page_pool_enabled; > > > > > > > > +module_param(page_pool_enabled, bool, 0400); > > > > > > > > + > > > > > > > > /* FIXME: MTU in config. */ > > > > > > > > #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) > > > > > > > > #define GOOD_COPY_LEN 128 > > > > > > > > @@ -159,6 +162,9 @@ struct receive_queue { > > > > > > > > /* Chain pages by the private ptr. */ > > > > > > > > struct page *pages; > > > > > > > > > > > > > > > > + /* Page pool */ > > > > > > > > + struct page_pool *page_pool; > > > > > > > > + > > > > > > > > /* Average packet length for mergeable receive buffers. */ > > > > > > > > struct ewma_pkt_len mrg_avg_pkt_len; > > > > > > > > > > > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, unsigned int buflen, > > > > > > > > return skb; > > > > > > > > } > > > > > > > > > > > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page) > > > > > > > > +{ > > > > > > > > + if (rq->page_pool) > > > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > > > + else > > > > > > > > + put_page(page); > > > > > > > > +} > > > > > > > > + > > > > > > > > /* Called from bottom half context */ > > > > > > > > static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > > struct receive_queue *rq, > > > > > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > > > > > > > hdr = skb_vnet_hdr(skb); > > > > > > > > memcpy(hdr, hdr_p, hdr_len); > > > > > > > > if (page_to_free) > > > > > > > > - put_page(page_to_free); > > > > > > > > + virtnet_put_page(rq, page_to_free); > > > > > > > > > > > > > > > > return skb; > > > > > > > > } > > > > > > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, > > > > > > > > return ret; > > > > > > > > } > > > > > > > > > > > > > > > > -static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq) > > > > > > > > { > > > > > > > > struct skb_shared_info *shinfo; > > > > > > > > struct page *xdp_page; > > > > > > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) > > > > > > > > shinfo = xdp_get_shared_info_from_buff(xdp); > > > > > > > > for (i = 0; i < shinfo->nr_frags; i++) { > > > > > > > > xdp_page = skb_frag_page(&shinfo->frags[i]); > > > > > > > > - put_page(xdp_page); > > > > > > > > + virtnet_put_page(rq, xdp_page); > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, > > > > > > > > if (page_off + *len + tailroom > PAGE_SIZE) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > - page = alloc_page(GFP_ATOMIC); > > > > > > > > + if (rq->page_pool) > > > > > > > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > > + else > > > > > > > > + page = alloc_page(GFP_ATOMIC); > > > > > > > > + > > > > > > > > if (!page) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, > > > > > > > > * is sending packet larger than the MTU. > > > > > > > > */ > > > > > > > > if ((page_off + buflen + tailroom) > PAGE_SIZE) { > > > > > > > > - put_page(p); > > > > > > > > + virtnet_put_page(rq, p); > > > > > > > > goto err_buf; > > > > > > > > } > > > > > > > > > > > > > > > > memcpy(page_address(page) + page_off, > > > > > > > > page_address(p) + off, buflen); > > > > > > > > page_off += buflen; > > > > > > > > - put_page(p); > > > > > > > > + virtnet_put_page(rq, p); > > > > > > > > } > > > > > > > > > > > > > > > > /* Headroom does not contribute to packet length */ > > > > > > > > *len = page_off - VIRTIO_XDP_HEADROOM; > > > > > > > > return page; > > > > > > > > err_buf: > > > > > > > > - __free_pages(page, 0); > > > > > > > > + if (rq->page_pool) > > > > > > > > + page_pool_put_full_page(rq->page_pool, page, true); > > > > > > > > + else > > > > > > > > + __free_pages(page, 0); > > > > > > > > return NULL; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct receive_queue *rq, int num_buf, > > > > > > > > } > > > > > > > > stats->bytes += len; > > > > > > > > page = virt_to_head_page(buf); > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > @@ -1264,7 +1285,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, > > > > > > > > cur_frag_size = truesize; > > > > > > > > xdp_frags_truesz += cur_frag_size; > > > > > > > > if (unlikely(len > truesize - room || cur_frag_size > PAGE_SIZE)) { > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > pr_debug("%s: rx error: len %u exceeds truesize %lu\n", > > > > > > > > dev->name, len, (unsigned long)(truesize - room)); > > > > > > > > dev->stats.rx_length_errors++; > > > > > > > > @@ -1283,7 +1304,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, > > > > > > > > return 0; > > > > > > > > > > > > > > > > err: > > > > > > > > - put_xdp_frags(xdp); > > > > > > > > + put_xdp_frags(xdp, rq); > > > > > > > > return -EINVAL; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -1344,7 +1365,10 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, > > > > > > > > if (*len + xdp_room > PAGE_SIZE) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > - xdp_page = alloc_page(GFP_ATOMIC); > > > > > > > > + if (rq->page_pool) > > > > > > > > + xdp_page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > > + else > > > > > > > > + xdp_page = alloc_page(GFP_ATOMIC); > > > > > > > > if (!xdp_page) > > > > > > > > return NULL; > > > > > > > > > > > > > > > > @@ -1354,7 +1378,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_info *vi, > > > > > > > > > > > > > > > > *frame_sz = PAGE_SIZE; > > > > > > > > > > > > > > > > - put_page(*page); > > > > > > > > + virtnet_put_page(rq, *page); > > > > > > > > > > > > > > > > *page = xdp_page; > > > > > > > > > > > > > > > > @@ -1400,6 +1424,8 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev, > > > > > > > > head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz); > > > > > > > > if (unlikely(!head_skb)) > > > > > > > > break; > > > > > > > > + if (rq->page_pool) > > > > > > > > + skb_mark_for_recycle(head_skb); > > > > > > > > return head_skb; > > > > > > > > > > > > > > > > case XDP_TX: > > > > > > > > @@ -1410,10 +1436,10 @@ static struct sk_buff *receive_mergeable_xdp(struct net_device *dev, > > > > > > > > break; > > > > > > > > } > > > > > > > > > > > > > > > > - put_xdp_frags(&xdp); > > > > > > > > + put_xdp_frags(&xdp, rq); > > > > > > > > > > > > > > > > err_xdp: > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > > > stats->xdp_drops++; > > > > > > > > @@ -1467,6 +1493,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > head_skb = page_to_skb(vi, rq, page, offset, len, truesize, headroom); > > > > > > > > curr_skb = head_skb; > > > > > > > > > > > > > > > > + if (rq->page_pool) > > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > > + > > > > > > > > if (unlikely(!curr_skb)) > > > > > > > > goto err_skb; > > > > > > > > while (--num_buf) { > > > > > > > > @@ -1509,6 +1538,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > curr_skb = nskb; > > > > > > > > head_skb->truesize += nskb->truesize; > > > > > > > > num_skb_frags = 0; > > > > > > > > + if (rq->page_pool) > > > > > > > > + skb_mark_for_recycle(curr_skb); > > > > > > > > } > > > > > > > > if (curr_skb != head_skb) { > > > > > > > > head_skb->data_len += len; > > > > > > > > @@ -1517,7 +1548,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > } > > > > > > > > offset = buf - page_address(page); > > > > > > > > if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) { > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1, > > > > > > > > len, truesize); > > > > > > > > } else { > > > > > > > > @@ -1530,7 +1561,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, > > > > > > > > return head_skb; > > > > > > > > > > > > > > > > err_skb: > > > > > > > > - put_page(page); > > > > > > > > + virtnet_put_page(rq, page); > > > > > > > > mergeable_buf_free(rq, num_buf, dev, stats); > > > > > > > > > > > > > > > > err_buf: > > > > > > > > @@ -1737,31 +1768,40 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, > > > > > > > > * disabled GSO for XDP, it won't be a big issue. > > > > > > > > */ > > > > > > > > len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); > > > > > > > > - if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > > - return -ENOMEM; > > > > > > > > + if (rq->page_pool) { > > > > > > > > + struct page *page; > > > > > > > > > > > > > > > > - buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > > > > > > > - buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > > - get_page(alloc_frag->page); > > > > > > > > - alloc_frag->offset += len + room; > > > > > > > > - hole = alloc_frag->size - alloc_frag->offset; > > > > > > > > - if (hole < len + room) { > > > > > > > > - /* To avoid internal fragmentation, if there is very likely not > > > > > > > > - * enough space for another buffer, add the remaining space to > > > > > > > > - * the current buffer. > > > > > > > > - * XDP core assumes that frame_size of xdp_buff and the length > > > > > > > > - * of the frag are PAGE_SIZE, so we disable the hole mechanism. > > > > > > > > - */ > > > > > > > > - if (!headroom) > > > > > > > > - len += hole; > > > > > > > > - alloc_frag->offset += hole; > > > > > > > > - } > > > > > > > > + page = page_pool_dev_alloc_pages(rq->page_pool); > > > > > > > > + if (unlikely(!page)) > > > > > > > > + return -ENOMEM; > > > > > > > > + buf = (char *)page_address(page); > > > > > > > > + buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > > + } else { > > > > > > > > + if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) > > > > > > > > + return -ENOMEM; > > > > > > > > > > > > > > > > + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > > > > > > > + buf += headroom; /* advance address leaving hole at front of pkt */ > > > > > > > > + get_page(alloc_frag->page); > > > > > > > > + alloc_frag->offset += len + room; > > > > > > > > + hole = alloc_frag->size - alloc_frag->offset; > > > > > > > > + if (hole < len + room) { > > > > > > > > + /* To avoid internal fragmentation, if there is very likely not > > > > > > > > + * enough space for another buffer, add the remaining space to > > > > > > > > + * the current buffer. > > > > > > > > + * XDP core assumes that frame_size of xdp_buff and the length > > > > > > > > + * of the frag are PAGE_SIZE, so we disable the hole mechanism. > > > > > > > > + */ > > > > > > > > + if (!headroom) > > > > > > > > + len += hole; > > > > > > > > + alloc_frag->offset += hole; > > > > > > > > + } > > > > > > > > + } > > > > > > > > sg_init_one(rq->sg, buf, len); > > > > > > > > ctx = mergeable_len_to_ctx(len + room, headroom); > > > > > > > > err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); > > > > > > > > if (err < 0) > > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > > + virtnet_put_page(rq, virt_to_head_page(buf)); > > > > > > > > > > > > > > > > return err; > > > > > > > > } > > > > > > > > @@ -1994,8 +2034,15 @@ static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index) > > > > > > > > if (err < 0) > > > > > > > > return err; > > > > > > > > > > > > > > > > - err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > > - MEM_TYPE_PAGE_SHARED, NULL); > > > > > > > > + if (vi->rq[qp_index].page_pool) > > > > > > > > + err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > > + MEM_TYPE_PAGE_POOL, > > > > > > > > + vi->rq[qp_index].page_pool); > > > > > > > > + else > > > > > > > > + err = xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, > > > > > > > > + MEM_TYPE_PAGE_SHARED, > > > > > > > > + NULL); > > > > > > > > + > > > > > > > > if (err < 0) > > > > > > > > goto err_xdp_reg_mem_model; > > > > > > > > > > > > > > > > @@ -2951,6 +2998,7 @@ static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *data) > > > > > > > > ethtool_sprintf(&p, "tx_queue_%u_%s", i, > > > > > > > > virtnet_sq_stats_desc[j].desc); > > > > > > > > } > > > > > > > > + page_pool_ethtool_stats_get_strings(p); > > > > > > > > break; > > > > > > > > } > > > > > > > > } > > > > > > > > @@ -2962,12 +3010,30 @@ static int virtnet_get_sset_count(struct net_device *dev, int sset) > > > > > > > > switch (sset) { > > > > > > > > case ETH_SS_STATS: > > > > > > > > return vi->curr_queue_pairs * (VIRTNET_RQ_STATS_LEN + > > > > > > > > - VIRTNET_SQ_STATS_LEN); > > > > > > > > + VIRTNET_SQ_STATS_LEN + > > > > > > > > + (page_pool_enabled && vi->mergeable_rx_bufs ? > > > > > > > > + page_pool_ethtool_stats_get_count() : 0)); > > > > > > > > default: > > > > > > > > return -EOPNOTSUPP; > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > +static void virtnet_get_page_pool_stats(struct net_device *dev, u64 *data) > > > > > > > > +{ > > > > > > > > +#ifdef CONFIG_PAGE_POOL_STATS > > > > > > > > + struct virtnet_info *vi = netdev_priv(dev); > > > > > > > > + struct page_pool_stats pp_stats = {}; > > > > > > > > + int i; > > > > > > > > + > > > > > > > > + for (i = 0; i < vi->curr_queue_pairs; i++) { > > > > > > > > + if (!vi->rq[i].page_pool) > > > > > > > > + continue; > > > > > > > > + page_pool_get_stats(vi->rq[i].page_pool, &pp_stats); > > > > > > > > + } > > > > > > > > + page_pool_ethtool_stats_get(data, &pp_stats); > > > > > > > > +#endif /* CONFIG_PAGE_POOL_STATS */ > > > > > > > > +} > > > > > > > > + > > > > > > > > static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > > > struct ethtool_stats *stats, u64 *data) > > > > > > > > { > > > > > > > > @@ -3003,6 +3069,8 @@ static void virtnet_get_ethtool_stats(struct net_device *dev, > > > > > > > > } while (u64_stats_fetch_retry(&sq->stats.syncp, start)); > > > > > > > > idx += VIRTNET_SQ_STATS_LEN; > > > > > > > > } > > > > > > > > + > > > > > > > > + virtnet_get_page_pool_stats(dev, &data[idx]); > > > > > > > > } > > > > > > > > > > > > > > > > static void virtnet_get_channels(struct net_device *dev, > > > > > > > > @@ -3623,6 +3691,8 @@ static void virtnet_free_queues(struct virtnet_info *vi) > > > > > > > > for (i = 0; i < vi->max_queue_pairs; i++) { > > > > > > > > __netif_napi_del(&vi->rq[i].napi); > > > > > > > > __netif_napi_del(&vi->sq[i].napi); > > > > > > > > + if (vi->rq[i].page_pool) > > > > > > > > + page_pool_destroy(vi->rq[i].page_pool); > > > > > > > > } > > > > > > > > > > > > > > > > /* We called __netif_napi_del(), > > > > > > > > @@ -3679,12 +3749,19 @@ static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf) > > > > > > > > struct virtnet_info *vi = vq->vdev->priv; > > > > > > > > int i = vq2rxq(vq); > > > > > > > > > > > > > > > > - if (vi->mergeable_rx_bufs) > > > > > > > > - put_page(virt_to_head_page(buf)); > > > > > > > > - else if (vi->big_packets) > > > > > > > > + if (vi->mergeable_rx_bufs) { > > > > > > > > + if (vi->rq[i].page_pool) { > > > > > > > > + page_pool_put_full_page(vi->rq[i].page_pool, > > > > > > > > + virt_to_head_page(buf), > > > > > > > > + true); > > > > > > > > + } else { > > > > > > > > + put_page(virt_to_head_page(buf)); > > > > > > > > + } > > > > > > > > + } else if (vi->big_packets) { > > > > > > > > give_pages(&vi->rq[i], buf); > > > > > > > > - else > > > > > > > > + } else { > > > > > > > > put_page(virt_to_head_page(buf)); > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > > > static void free_unused_bufs(struct virtnet_info *vi) > > > > > > > > @@ -3718,6 +3795,26 @@ static void virtnet_del_vqs(struct virtnet_info *vi) > > > > > > > > virtnet_free_queues(vi); > > > > > > > > } > > > > > > > > > > > > > > > > +static void virtnet_alloc_page_pool(struct receive_queue *rq) > > > > > > > > +{ > > > > > > > > + struct virtio_device *vdev = rq->vq->vdev; > > > > > > > > + > > > > > > > > + struct page_pool_params pp_params = { > > > > > > > > + .order = 0, > > > > > > > > + .pool_size = rq->vq->num_max, > > > > > > > > + .nid = dev_to_node(vdev->dev.parent), > > > > > > > > + .dev = vdev->dev.parent, > > > > > > > > + .offset = 0, > > > > > > > > + }; > > > > > > > > + > > > > > > > > + rq->page_pool = page_pool_create(&pp_params); > > > > > > > > + if (IS_ERR(rq->page_pool)) { > > > > > > > > + dev_warn(&vdev->dev, "page pool creation failed: %ld\n", > > > > > > > > + PTR_ERR(rq->page_pool)); > > > > > > > > + rq->page_pool = NULL; > > > > > > > > + } > > > > > > > > +} > > > > > > > > + > > > > > > > > /* How large should a single buffer be so a queue full of these can fit at > > > > > > > > * least one full packet? > > > > > > > > * Logic below assumes the mergeable buffer header is used. > > > > > > > > @@ -3801,6 +3898,13 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > > > > > vi->rq[i].vq = vqs[rxq2vq(i)]; > > > > > > > > vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); > > > > > > > > vi->sq[i].vq = vqs[txq2vq(i)]; > > > > > > > > + > > > > > > > > + if (page_pool_enabled && vi->mergeable_rx_bufs) > > > > > > > > + virtnet_alloc_page_pool(&vi->rq[i]); > > > > > > > > + else > > > > > > > > + dev_warn(&vi->vdev->dev, > > > > > > > > + "page pool only support mergeable mode\n"); > > > > > > > > + > > > > > > > > } > > > > > > > > > > > > > > > > /* run here: ret == 0. */ > > > > > > > > -- > > > > > > > > 2.31.1 > > > > > > > > > > > > > > >