Alvaro Karsz
2023-Apr-30 13:15 UTC
[RFC PATCH net 0/3] virtio-net: allow usage of small vrings
At the moment, if a virtio network device uses vrings with less than MAX_SKB_FRAGS + 2 entries, the device won't be functional. The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always evaluate to false, leading to TX timeouts. This patchset attempts this fix this bug, and to allow small rings down to 4 entries. The first patch introduces a new mechanism in virtio core - it allows to block features in probe time. If a virtio drivers blocks features and fails probe, virtio core will reset the device, re-negotiate the features and probe again. This is needed since some virtio net features are not supported with small rings. This patchset follows a discussion in the mailing list [1]. This fixes only part of the bug, rings with less than 4 entries won't work. My intention is to split the effort and fix the RING_SIZE < 4 case in a follow up patchset. Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed and split VQs, with rings down to 4 entries, with and without VIRTIO_NET_F_MRG_RXBUF, with big MTUs. I would appreciate more testing. Xuan: I wasn't able to test XDP with my setup, maybe you can help with that? [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz at solid-run.com/ Alvaro Karsz (3): virtio: re-negotiate features if probe fails and features are blocked virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 virtio-net: block ethtool from converting a ring to a small ring drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- drivers/virtio/virtio.c | 73 +++++++++++++----- include/linux/virtio.h | 3 + 3 files changed, 212 insertions(+), 25 deletions(-) -- 2.34.1
Alvaro Karsz
2023-Apr-30 13:15 UTC
[RFC PATCH net 1/3] virtio: re-negotiate features if probe fails and features are blocked
This patch exports a new virtio core function: virtio_block_feature. The function should be called during a virtio driver probe. If a virtio driver blocks features during probe and fails probe, virtio core will reset the device, try to re-negotiate the new features and probe again. Signed-off-by: Alvaro Karsz <alvaro.karsz at solid-run.com> --- drivers/virtio/virtio.c | 73 ++++++++++++++++++++++++++++++----------- include/linux/virtio.h | 3 ++ 2 files changed, 56 insertions(+), 20 deletions(-) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 3893dc29eb2..eaad5b6a7a9 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -167,6 +167,13 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status) } EXPORT_SYMBOL_GPL(virtio_add_status); +void virtio_block_feature(struct virtio_device *dev, unsigned int f) +{ + BUG_ON(f >= 64); + dev->blocked_features |= (1ULL << f); +} +EXPORT_SYMBOL_GPL(virtio_block_feature); + /* Do some validation, then set FEATURES_OK */ static int virtio_features_ok(struct virtio_device *dev) { @@ -234,17 +241,13 @@ void virtio_reset_device(struct virtio_device *dev) } EXPORT_SYMBOL_GPL(virtio_reset_device); -static int virtio_dev_probe(struct device *_d) +static int virtio_negotiate_features(struct virtio_device *dev) { - int err, i; - struct virtio_device *dev = dev_to_virtio(_d); struct virtio_driver *drv = drv_to_virtio(dev->dev.driver); u64 device_features; u64 driver_features; u64 driver_features_legacy; - - /* We have a driver! */ - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER); + int i, ret; /* Figure out what features the device supports. */ device_features = dev->config->get_features(dev); @@ -279,30 +282,61 @@ static int virtio_dev_probe(struct device *_d) if (device_features & (1ULL << i)) __virtio_set_bit(dev, i); - err = dev->config->finalize_features(dev); - if (err) - goto err; + /* Remove blocked features */ + dev->features &= ~dev->blocked_features; + + ret = dev->config->finalize_features(dev); + if (ret) + goto exit; if (drv->validate) { u64 features = dev->features; - err = drv->validate(dev); - if (err) - goto err; + ret = drv->validate(dev); + if (ret) + goto exit; /* Did validation change any features? Then write them again. */ if (features != dev->features) { - err = dev->config->finalize_features(dev); - if (err) - goto err; + ret = dev->config->finalize_features(dev); + if (ret) + goto exit; } } - err = virtio_features_ok(dev); - if (err) - goto err; + ret = virtio_features_ok(dev); +exit: + return ret; +} + +static int virtio_dev_probe(struct device *_d) +{ + int err; + struct virtio_device *dev = dev_to_virtio(_d); + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver); + u64 blocked_features; + bool renegotiate = true; + + /* We have a driver! */ + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER); + + /* Store blocked features and attempt to negotiate features & probe. + * If the probe fails, we check if the driver has blocked any new features. + * If it has, we reset the device and try again with the new features. + */ + while (renegotiate) { + blocked_features = dev->blocked_features; + err = virtio_negotiate_features(dev); + if (err) + break; + + err = drv->probe(dev); + if (err && blocked_features != dev->blocked_features) + virtio_reset_device(dev); + else + renegotiate = false; + } - err = drv->probe(dev); if (err) goto err; @@ -319,7 +353,6 @@ static int virtio_dev_probe(struct device *_d) err: virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); return err; - } static void virtio_dev_remove(struct device *_d) diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b93238db94e..2de9b2d3ca4 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -109,6 +109,7 @@ int virtqueue_resize(struct virtqueue *vq, u32 num, * @vringh_config: configuration ops for host vrings. * @vqs: the list of virtqueues for this device. * @features: the features supported by both driver and device. + * @blocked_features: the features blocked by the driver that can't be negotiated. * @priv: private pointer for the driver's use. */ struct virtio_device { @@ -124,6 +125,7 @@ struct virtio_device { const struct vringh_config_ops *vringh_config; struct list_head vqs; u64 features; + u64 blocked_features; void *priv; }; @@ -133,6 +135,7 @@ void virtio_add_status(struct virtio_device *dev, unsigned int status); int register_virtio_device(struct virtio_device *dev); void unregister_virtio_device(struct virtio_device *dev); bool is_virtio_device(struct device *dev); +void virtio_block_feature(struct virtio_device *dev, unsigned int f); void virtio_break_device(struct virtio_device *dev); void __virtio_unbreak_device(struct virtio_device *dev); -- 2.34.1
Alvaro Karsz
2023-Apr-30 13:15 UTC
[RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
At the moment, if a network device uses vrings with less than MAX_SKB_FRAGS + 2 entries, the device won't be functional. The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always evaluate to false, leading to TX timeouts. This patch introduces a new variable, single_pkt_max_descs, that holds the max number of descriptors we may need to handle a single packet. This patch also detects the small vring during probe, blocks some features that can't be used with small vrings, and fails probe, leading to a reset and features re-negotiation. Features that can't be used with small vrings: GRO features (VIRTIO_NET_F_GUEST_*): When we use small vrings, we may not have enough entries in the ring to chain page size buffers and form a 64K buffer. So we may need to allocate 64k of continuous memory, which may be too much when the system is stressed. This patch also fixes the MTU size in small vring cases to be up to the default one, 1500B. Signed-off-by: Alvaro Karsz <alvaro.karsz at solid-run.com> --- drivers/net/virtio_net.c | 149 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 144 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8d8038538fc..b4441d63890 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -103,6 +103,8 @@ struct virtnet_rq_stats { #define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m) #define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m) +#define IS_SMALL_VRING(size) ((size) < MAX_SKB_FRAGS + 2) + static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = { { "packets", VIRTNET_SQ_STAT(packets) }, { "bytes", VIRTNET_SQ_STAT(bytes) }, @@ -268,6 +270,12 @@ struct virtnet_info { /* Does the affinity hint is set for virtqueues? */ bool affinity_hint_set; + /* How many ring descriptors we may need to transmit a single packet */ + u16 single_pkt_max_descs; + + /* Do we have virtqueues with small vrings? */ + bool svring; + /* CPU hotplug instances for online & dead */ struct hlist_node node; struct hlist_node node_dead; @@ -455,6 +463,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int copy, hdr_len, hdr_padded_len; struct page *page_to_free = NULL; int tailroom, shinfo_size; + u16 max_frags = MAX_SKB_FRAGS; char *p, *hdr_p, *buf; p = page_address(page) + offset; @@ -520,7 +529,10 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, * tries to receive more than is possible. This is usually * the case of a broken device. */ - if (unlikely(len > MAX_SKB_FRAGS * PAGE_SIZE)) { + if (unlikely(vi->svring)) + max_frags = 1; + + if (unlikely(len > max_frags * PAGE_SIZE)) { net_dbg_ratelimited("%s: too much data\n", skb->dev->name); dev_kfree_skb(skb); return NULL; @@ -612,7 +624,7 @@ static void check_sq_full_and_disable(struct virtnet_info *vi, * Since most packets only take 1 or 2 ring slots, stopping the queue * early means 16 slots are typically wasted. */ - if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { + if (sq->vq->num_free < vi->single_pkt_max_descs) { netif_stop_subqueue(dev, qnum); if (use_napi) { if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) @@ -620,7 +632,7 @@ static void check_sq_full_and_disable(struct virtnet_info *vi, } else if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { /* More just got used, free them then recheck. */ free_old_xmit_skbs(sq, false); - if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { + if (sq->vq->num_free >= vi->single_pkt_max_descs) { netif_start_subqueue(dev, qnum); virtqueue_disable_cb(sq->vq); } @@ -1108,6 +1120,10 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, return 0; if (*num_buf > 1) { + /* Small vring - can't be more than 1 buffer */ + if (unlikely(vi->svring)) + return -EINVAL; + /* If we want to build multi-buffer xdp, we need * to specify that the flags of xdp_buff have the * XDP_FLAGS_HAS_FRAG bit. @@ -1828,7 +1844,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) free_old_xmit_skbs(sq, true); } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq))); - if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + if (sq->vq->num_free >= vi->single_pkt_max_descs) netif_tx_wake_queue(txq); __netif_tx_unlock(txq); @@ -1919,7 +1935,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget) virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, true); - if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + if (sq->vq->num_free >= vi->single_pkt_max_descs) netif_tx_wake_queue(txq); opaque = virtqueue_enable_cb_prepare(sq->vq); @@ -3862,6 +3878,15 @@ static bool virtnet_check_guest_gso(const struct virtnet_info *vi) virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_USO6)); } +static bool virtnet_check_host_gso(const struct virtnet_info *vi) +{ + return virtio_has_feature(vi->vdev, VIRTIO_NET_F_HOST_TSO4) || + virtio_has_feature(vi->vdev, VIRTIO_NET_F_HOST_TSO6) || + virtio_has_feature(vi->vdev, VIRTIO_NET_F_HOST_ECN) || + virtio_has_feature(vi->vdev, VIRTIO_NET_F_HOST_UFO) || + virtio_has_feature(vi->vdev, VIRTIO_NET_F_HOST_USO); +} + static void virtnet_set_big_packets(struct virtnet_info *vi, const int mtu) { bool guest_gso = virtnet_check_guest_gso(vi); @@ -3876,6 +3901,112 @@ static void virtnet_set_big_packets(struct virtnet_info *vi, const int mtu) } } +static u16 virtnet_calc_max_descs(struct virtnet_info *vi) +{ + if (vi->svring) { + if (virtnet_check_host_gso(vi)) + return 4; /* 1 fragment + linear part + virtio header + GSO header */ + else + return 3; /* 1 fragment + linear part + virtio header */ + } else { + return MAX_SKB_FRAGS + 2; + } +} + +static bool virtnet_uses_svring(struct virtnet_info *vi) +{ + u32 i; + + /* If a transmit/receive virtqueue is small, + * we cannot handle fragmented packets. + */ + for (i = 0; i < vi->max_queue_pairs; i++) { + if (IS_SMALL_VRING(virtqueue_get_vring_size(vi->sq[i].vq)) || + IS_SMALL_VRING(virtqueue_get_vring_size(vi->rq[i].vq))) + return true; + } + + return false; +} + +/* Function returns the number of features it blocked */ +static int virtnet_block_svring_unsupported(struct virtio_device *vdev) +{ + int cnt = 0; + /* Block Virtio guest GRO features. + * Asking Linux to allocate 64k of continuous memory is too much, + * specially when the system is stressed. + * + * If VIRTIO_NET_F_MRG_RXBUF is negotiated we can allcoate smaller + * buffers, but since the ring is small, the buffers can be quite big. + * + */ + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4)) { + virtio_block_feature(vdev, VIRTIO_NET_F_GUEST_TSO4); + cnt++; + } + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) { + virtio_block_feature(vdev, VIRTIO_NET_F_GUEST_TSO6); + cnt++; + } + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN)) { + virtio_block_feature(vdev, VIRTIO_NET_F_GUEST_ECN); + cnt++; + } + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) { + virtio_block_feature(vdev, VIRTIO_NET_F_GUEST_UFO); + cnt++; + } + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_USO4)) { + virtio_block_feature(vdev, VIRTIO_NET_F_GUEST_USO4); + cnt++; + } + if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_USO6)) { + virtio_block_feature(vdev, VIRTIO_NET_F_GUEST_USO6); + cnt++; + } + + return cnt; +} + +static int virtnet_fixup_svring(struct virtnet_info *vi) +{ + int i; + /* Do we use small vrings? + * If not, nothing we need to do. + */ + vi->svring = virtnet_uses_svring(vi); + if (!vi->svring) + return 0; + + /* Some features can't be used with small vrings. + * Block those and return an error. + * This will trigger a reprobe without the blocked + * features. + */ + if (virtnet_block_svring_unsupported(vi->vdev)) + return -EOPNOTSUPP; + + /* Disable NETIF_F_SG */ + vi->dev->hw_features &= ~NETIF_F_SG; + + /* Don't use MTU bigger than default */ + if (vi->dev->max_mtu > ETH_DATA_LEN) + vi->dev->max_mtu = ETH_DATA_LEN; + if (vi->dev->mtu > ETH_DATA_LEN) + vi->dev->mtu = ETH_DATA_LEN; + + /* Don't use big packets */ + vi->big_packets = false; + vi->big_packets_num_skbfrags = 1; + + /* Fix min_buf_len for receive virtqueues */ + for (i = 0; i < vi->max_queue_pairs; i++) + vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); + + return 0; +} + static int virtnet_probe(struct virtio_device *vdev) { int i, err = -ENOMEM; @@ -4061,6 +4192,14 @@ static int virtnet_probe(struct virtio_device *vdev) if (err) goto free; + /* Do required fixups in case we are using small vrings */ + err = virtnet_fixup_svring(vi); + if (err) + goto free_vqs; + + /* Calculate the max. number of descriptors we may need to transmit a single packet */ + vi->single_pkt_max_descs = virtnet_calc_max_descs(vi); + #ifdef CONFIG_SYSFS if (vi->mergeable_rx_bufs) dev->sysfs_rx_queue_group = &virtio_net_mrg_rx_group; -- 2.34.1
Alvaro Karsz
2023-Apr-30 13:15 UTC
[RFC PATCH net 3/3] virtio-net: block ethtool from converting a ring to a small ring
Stop ethtool from resizing a TX/RX ring to size less than MAX_SKB_FRAGS + 2, if the ring was initialized with a bigger size. We cannot convert a "normal" ring to a "small" ring in runtime. Signed-off-by: Alvaro Karsz <alvaro.karsz at solid-run.com> --- drivers/net/virtio_net.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b4441d63890..b8238eaa1e4 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2071,6 +2071,12 @@ static int virtnet_rx_resize(struct virtnet_info *vi, bool running = netif_running(vi->dev); int err, qindex; + /* We cannot convert a ring to a small vring */ + if (!vi->svring && IS_SMALL_VRING(ring_num)) { + netdev_err(vi->dev, "resize rx fail: size is too small..\n"); + return -EINVAL; + } + qindex = rq - vi->rq; if (running) @@ -2097,6 +2103,12 @@ static int virtnet_tx_resize(struct virtnet_info *vi, qindex = sq - vi->sq; + /* We cannot convert a ring to a small vring */ + if (!vi->svring && IS_SMALL_VRING(ring_num)) { + netdev_err(vi->dev, "resize tx fail: size is too small..\n"); + return -EINVAL; + } + if (running) virtnet_napi_tx_disable(&sq->napi); -- 2.34.1
Michael S. Tsirkin
2023-Apr-30 14:06 UTC
[RFC PATCH net 0/3] virtio-net: allow usage of small vrings
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote:> At the moment, if a virtio network device uses vrings with less than > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > evaluate to false, leading to TX timeouts. > > This patchset attempts this fix this bug, and to allow small rings down > to 4 entries. > The first patch introduces a new mechanism in virtio core - it allows to > block features in probe time. > > If a virtio drivers blocks features and fails probe, virtio core will > reset the device, re-negotiate the features and probe again. > > This is needed since some virtio net features are not supported with > small rings. > > This patchset follows a discussion in the mailing list [1]. > > This fixes only part of the bug, rings with less than 4 entries won't > work.Why the difference?> My intention is to split the effort and fix the RING_SIZE < 4 case in a > follow up patchset. > > Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset?I'd keep current behaviour.> I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed > and split VQs, with rings down to 4 entries, with and without > VIRTIO_NET_F_MRG_RXBUF, with big MTUs. > > I would appreciate more testing. > Xuan: I wasn't able to test XDP with my setup, maybe you can help with > that? > > [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz at solid-run.com/ > > Alvaro Karsz (3): > virtio: re-negotiate features if probe fails and features are blocked > virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 > virtio-net: block ethtool from converting a ring to a small ring > > drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- > drivers/virtio/virtio.c | 73 +++++++++++++----- > include/linux/virtio.h | 3 + > 3 files changed, 212 insertions(+), 25 deletions(-) > > -- > 2.34.1
Michael S. Tsirkin
2023-Jun-17 07:44 UTC
[RFC PATCH net 0/3] virtio-net: allow usage of small vrings
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote:> At the moment, if a virtio network device uses vrings with less than > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > evaluate to false, leading to TX timeouts. > > This patchset attempts this fix this bug, and to allow small rings down > to 4 entries. > > The first patch introduces a new mechanism in virtio core - it allows to > block features in probe time. > > If a virtio drivers blocks features and fails probe, virtio core will > reset the device, re-negotiate the features and probe again. > > This is needed since some virtio net features are not supported with > small rings. > > This patchset follows a discussion in the mailing list [1]. > > This fixes only part of the bug, rings with less than 4 entries won't > work. > My intention is to split the effort and fix the RING_SIZE < 4 case in a > follow up patchset. > > Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? > > I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed > and split VQs, with rings down to 4 entries, with and without > VIRTIO_NET_F_MRG_RXBUF, with big MTUs. > > I would appreciate more testing. > Xuan: I wasn't able to test XDP with my setup, maybe you can help with > that? > > [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz at solid-run.com/the work is orphaned for now. Jason do you want to pick this up? Related to all the hardening I guess ...> Alvaro Karsz (3): > virtio: re-negotiate features if probe fails and features are blocked > virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 > virtio-net: block ethtool from converting a ring to a small ring > > drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- > drivers/virtio/virtio.c | 73 +++++++++++++----- > include/linux/virtio.h | 3 + > 3 files changed, 212 insertions(+), 25 deletions(-) > > -- > 2.34.1
Reasonably Related Threads
- [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
- [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
- [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
- [RFC PATCH net 1/3] virtio: re-negotiate features if probe fails and features are blocked
- [RFC PATCH net 1/3] virtio: re-negotiate features if probe fails and features are blocked