Si-Wei Liu
2022-Aug-08 23:56 UTC
[virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
On 8/8/2022 12:31 AM, Gavin Li wrote:> > On 8/6/2022 6:11 AM, Si-Wei Liu wrote: >> External email: Use caution opening links or attachments >> >> >> On 8/1/2022 9:45 PM, Gavin Li wrote: >>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big >>> packets even when GUEST_* offloads are not present on the device. >>> However, if GSO is not supported, >> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have >> been be called. > ACK >> >>> ? it would be sufficient to allocate >>> segments to cover just up the MTU size and no further. Allocating the >>> maximum amount of segments results in a large waste of buffer space in >>> the queue, which limits the number of packets that can be buffered and >>> can result in reduced performance. >>> >>> Therefore, if GSO is not supported, >> Ditto. > ACK >> >>> use the MTU to calculate the >>> optimal amount of segments required. >>> >>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for >>> 1 VQ, queue size 1024, before and after the change, with the iperf >>> server running over the virtio-net interface. >>> >>> MTU(Bytes)/Bandwidth (Gbit/s) >>> ????????????? Before?? After >>> ?? 1500??????? 22.5???? 22.4 >>> ?? 9000??????? 12.8???? 25.9 >>> >>> Signed-off-by: Gavin Li <gavinl at nvidia.com> >>> Reviewed-by: Gavi Teitz <gavi at nvidia.com> >>> Reviewed-by: Parav Pandit <parav at nvidia.com> >>> --- >>> ? drivers/net/virtio_net.c | 20 ++++++++++++++++---- >>> ? 1 file changed, 16 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>> index ec8e1b3108c3..d36918c1809d 100644 >>> --- a/drivers/net/virtio_net.c >>> +++ b/drivers/net/virtio_net.c >>> @@ -222,6 +222,9 @@ struct virtnet_info { >>> ????? /* I like... big packets and I cannot lie! */ >>> ????? bool big_packets; >>> >>> +???? /* Indicates GSO support */ >>> +???? bool gso_is_supported; >>> + >>> ????? /* Host will merge rx buffers for big packets (shake it! shake >>> it!) */ >>> ????? bool mergeable_rx_bufs; >>> >>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct >>> virtnet_info *vi, struct receive_queue *rq, >>> ? static int add_recvbuf_big(struct virtnet_info *vi, struct >>> receive_queue *rq, >>> ???????????????????????? gfp_t gfp) >>> ? { >>> +???? unsigned int sg_num = MAX_SKB_FRAGS; >>> ????? struct page *first, *list = NULL; >>> ????? char *p; >>> ????? int i, err, offset; >>> >>> -???? sg_init_table(rq->sg, MAX_SKB_FRAGS + 2); >>> +???? if (!vi->gso_is_supported) { >>> +???????????? unsigned int mtu = vi->dev->mtu; >>> + >>> +???????????? sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu >>> / PAGE_SIZE; >> DIV_ROUND_UP() can be used? > ACK >> >> Since this branch slightly adds up cost to the datapath, I wonder if >> this sg_num can be saved and set only once (generally in virtnet_probe >> time) in struct virtnet_info? > Not sure how to do it and align it with align with new mtu during > .ndo_change_mtu()---as you mentioned in the following mail. Any idea? > ndo_change_mtu might be in vendor specific code and unmanageable. In > my case, the mtu can only be changed in the xml of the guest vm.Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on a virtio-net device with 9000 MTU (as defined in guest xml). Basically guest user can set MTU to any valid value lower than the original HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu() should have validated the MTU value before coming down to it. And I suspect you might want to do virtnet_close() and virtnet_open() before/after changing the buffer size on the fly (the netif_running() case), implementing .ndo_change_mtu() will be needed anyway.>>> +???? } >>> + >>> +???? sg_init_table(rq->sg, sg_num + 2); >>> >>> ????? /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */ >> Comment doesn't match code. > ACK >>> -???? for (i = MAX_SKB_FRAGS + 1; i > 1; --i) { >>> +???? for (i = sg_num + 1; i > 1; --i) { >>> ????????????? first = get_a_page(rq, gfp); >>> ????????????? if (!first) { >>> ????????????????????? if (list) >>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info >>> *vi, struct receive_queue *rq, >>> >>> ????? /* chain first in list head */ >>> ????? first->private = (unsigned long)list; >>> -???? err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2, >>> +???? err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2, >>> ??????????????????????????????? first, gfp); >>> ????? if (err < 0) >>> ????????????? give_pages(rq, first); >>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device >>> *vdev) >>> ????? if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) || >>> ????????? virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) || >>> ????????? virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) || >>> -???????? virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) >>> +???????? virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) { >>> ????????????? vi->big_packets = true; >>> +???????????? vi->gso_is_supported = true; >> Please do the same for virtnet_clear_guest_offloads(), and >> correspondingly virtnet_restore_guest_offloads() as well. Not sure why >> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on >> successful return, seems like a bug to me. > ACK. The two calls virtnet_set_guest_offloads and > virtnet_set_guest_offloads is also called by virtnet_set_features. Do > you think if I can do this in virtnet_set_guest_offloads?I think that it should be fine, though you may want to deal with the XDP path not to regress it. -Siwei>> >> >> Thanks, >> -Siwei >>> +???? } >>> >>> ????? if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF)) >>> ????????????? vi->mergeable_rx_bufs = true; >>