Hi, This series contains small improvements for virtio pci driver. The main idea is to support surprise removal of virtio pci device when the driver is already loaded. Future patches will further improve other areas of hotplug. Patches 1 to 3 prepare the code to handle surprise removal by marking the device as broken in patch-4. Patch summary: patch-1: ensures that compiler optimization doesn't occur on vq->broken flag patch-2: maintains the mirror sequence on VQ delete and VQ create patch-3: protects vqs list for simultaneous access from reader and a writer patch-4: handles surprise removal of virtio pci device which avoids call trace and system lockup --- changelog: v2->v3: - updated commit message to mention that patch-1 is theoretical fix v1->v2: - updated commit log for WRITE_ONCE usage - improved vqs protection patch-3 for using natural structure alignment v0->v1: - fixed below comments from Michael - fixed typo in patch4 in comment - using WRITE_ONCE instead of smp_store_release() - using spinlock instead of rwlock - improved comment in patch - removed fixes tag in patch-1 Parav Pandit (4): virtio: Improve vq->broken access to avoid any compiler optimization virtio: Keep vring_del_virtqueue() mirror of VQ create virtio: Protect vqs list access virtio_pci: Support surprise removal of virtio pci device drivers/virtio/virtio.c | 1 + drivers/virtio/virtio_pci_common.c | 7 +++++++ drivers/virtio/virtio_ring.c | 17 ++++++++++++++--- include/linux/virtio.h | 1 + 4 files changed, 23 insertions(+), 3 deletions(-) -- 2.27.0
Parav Pandit
2021-Jul-21 14:26 UTC
[PATCH v3 1/4] virtio: Improve vq->broken access to avoid any compiler optimization
Currently vq->broken field is read by virtqueue_is_broken() in busy loop in one context by virtnet_send_command(). vq->broken is set to true in other process context by virtio_break_device(). Reader and writer are accessing it without any synchronization. This may lead to a compiler optimization which may result to optimize reading vq->broken only once. Hence, force reading vq->broken on each invocation of virtqueue_is_broken() and also force writing it so that such update is visible to the readers. It is a theoretical fix that isn't yet encountered in the field. Signed-off-by: Parav Pandit <parav at nvidia.com> --- changelog: v2->v3: - updated commit log v1->v2: - updated commit log to describe WRITE_ONCE usages v0->v1: - removed fixes tag from commit log - using WRITE_ONCE() as its enough for smp cache coherency as well as undesired compiler optimization --- drivers/virtio/virtio_ring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 89bfe46a8a7f..e179c7c7622c 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2373,7 +2373,7 @@ bool virtqueue_is_broken(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); - return vq->broken; + return READ_ONCE(vq->broken); } EXPORT_SYMBOL_GPL(virtqueue_is_broken); @@ -2387,7 +2387,9 @@ void virtio_break_device(struct virtio_device *dev) list_for_each_entry(_vq, &dev->vqs, list) { struct vring_virtqueue *vq = to_vvq(_vq); - vq->broken = true; + + /* Pairs with READ_ONCE() in virtqueue_is_broken(). */ + WRITE_ONCE(vq->broken, true); } } EXPORT_SYMBOL_GPL(virtio_break_device); -- 2.27.0
Parav Pandit
2021-Jul-21 14:26 UTC
[PATCH v3 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create
Keep the vring_del_virtqueue() mirror of the create routines. i.e. to delete list entry first as it is added last during the create routine. Signed-off-by: Parav Pandit <parav at nvidia.com> --- drivers/virtio/virtio_ring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index e179c7c7622c..d5934c2e5a89 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2291,6 +2291,8 @@ void vring_del_virtqueue(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + list_del(&_vq->list); + if (vq->we_own_ring) { if (vq->packed_ring) { vring_free_queue(vq->vq.vdev, @@ -2321,7 +2323,6 @@ void vring_del_virtqueue(struct virtqueue *_vq) kfree(vq->split.desc_state); kfree(vq->split.desc_extra); } - list_del(&_vq->list); kfree(vq); } EXPORT_SYMBOL_GPL(vring_del_virtqueue); -- 2.27.0
VQs may be accessed to mark the device broken while they are created/destroyed. Hence protect the access to the vqs list. Fixes: e2dcdfe95c0b ("virtio: virtio_break_device() to mark all virtqueues broken.") Signed-off-by: Parav Pandit <parav at nvidia.com> --- changelog: v1->v2: - use the hole in current struct to place vqs_list_lock to have natural packing v0->v1: - use spinlock instead of rwlock --- drivers/virtio/virtio.c | 1 + drivers/virtio/virtio_ring.c | 8 ++++++++ include/linux/virtio.h | 1 + 3 files changed, 10 insertions(+) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 4b15c00c0a0a..49984d2cba24 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -355,6 +355,7 @@ int register_virtio_device(struct virtio_device *dev) virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); INIT_LIST_HEAD(&dev->vqs); + spin_lock_init(&dev->vqs_list_lock); /* * device_add() causes the bus infrastructure to look for a matching diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index d5934c2e5a89..c2aaa0eff6df 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1755,7 +1755,9 @@ static struct virtqueue *vring_create_virtqueue_packed( cpu_to_le16(vq->packed.event_flags_shadow); } + spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); + spin_unlock(&vdev->vqs_list_lock); return &vq->vq; err_desc_extra: @@ -2229,7 +2231,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, memset(vq->split.desc_state, 0, vring.num * sizeof(struct vring_desc_state_split)); + spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); + spin_unlock(&vdev->vqs_list_lock); return &vq->vq; err_extra: @@ -2291,7 +2295,9 @@ void vring_del_virtqueue(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + spin_lock(&vq->vq.vdev->vqs_list_lock); list_del(&_vq->list); + spin_unlock(&vq->vq.vdev->vqs_list_lock); if (vq->we_own_ring) { if (vq->packed_ring) { @@ -2386,12 +2392,14 @@ void virtio_break_device(struct virtio_device *dev) { struct virtqueue *_vq; + spin_lock(&dev->vqs_list_lock); list_for_each_entry(_vq, &dev->vqs, list) { struct vring_virtqueue *vq = to_vvq(_vq); /* Pairs with READ_ONCE() in virtqueue_is_broken(). */ WRITE_ONCE(vq->broken, true); } + spin_unlock(&dev->vqs_list_lock); } EXPORT_SYMBOL_GPL(virtio_break_device); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b1894e0323fa..41edbc01ffa4 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -110,6 +110,7 @@ struct virtio_device { bool config_enabled; bool config_change_pending; spinlock_t config_lock; + spinlock_t vqs_list_lock; /* Protects VQs list access */ struct device dev; struct virtio_device_id id; const struct virtio_config_ops *config; -- 2.27.0
Parav Pandit
2021-Jul-21 14:26 UTC
[PATCH v3 4/4] virtio_pci: Support surprise removal of virtio pci device
When a virtio pci device undergo surprise removal (aka async removal in PCIe spec), mark the device as broken so that any upper layer drivers can abort any outstanding operation. When a virtio net pci device undergo surprise removal which is used by a NetworkManager, a below call trace was observed. kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059] watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059] CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S W I L 5.13.0-hotplug+ #8 Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020 Workqueue: events linkwatch_event RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net] Call Trace: virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net] ? __hw_addr_create_ex+0x85/0xc0 __dev_mc_add+0x72/0x80 igmp6_group_added+0xa7/0xd0 ipv6_mc_up+0x3c/0x60 ipv6_find_idev+0x36/0x80 addrconf_add_dev+0x1e/0xa0 addrconf_dev_config+0x71/0x130 addrconf_notify+0x1f5/0xb40 ? rtnl_is_locked+0x11/0x20 ? __switch_to_asm+0x42/0x70 ? finish_task_switch+0xaf/0x2c0 ? raw_notifier_call_chain+0x3e/0x50 raw_notifier_call_chain+0x3e/0x50 netdev_state_change+0x67/0x90 linkwatch_do_dev+0x3c/0x50 __linkwatch_run_queue+0xd2/0x220 linkwatch_event+0x21/0x30 process_one_work+0x1c8/0x370 worker_thread+0x30/0x380 ? process_one_work+0x370/0x370 kthread+0x118/0x140 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 Hence, add the ability to abort the command on surprise removal which prevents infinite loop and system lockup. Signed-off-by: Parav Pandit <parav at nvidia.com> --- changelog: v0->v1: - fixed typo in comment --- drivers/virtio/virtio_pci_common.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 222d630c41fc..b35bb2d57f62 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev) struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev); struct device *dev = get_device(&vp_dev->vdev.dev); + /* + * Device is marked broken on surprise removal so that virtio upper + * layers can abort any ongoing operation. + */ + if (!pci_device_is_present(pci_dev)) + virtio_break_device(&vp_dev->vdev); + pci_disable_sriov(pci_dev); unregister_virtio_device(&vp_dev->vdev); -- 2.27.0
Hi Michael,> From: Parav Pandit <parav at nvidia.com> > Sent: Wednesday, July 21, 2021 7:57 PM > To: mst at redhat.com; virtualization at lists.linux-foundation.org > Cc: Parav Pandit <parav at nvidia.com> > Subject: [PATCH v3 0/4] virtio short improvements > > Hi, > > This series contains small improvements for virtio pci driver. > The main idea is to support surprise removal of virtio pci device when the > driver is already loaded. Future patches will further improve other areas of > hotplug. > > Patches 1 to 3 prepare the code to handle surprise removal by marking the > device as broken in patch-4. > > Patch summary: > patch-1: ensures that compiler optimization doesn't occur on vq->broken > flag > patch-2: maintains the mirror sequence on VQ delete and VQ create > patch-3: protects vqs list for simultaneous access from reader and a writer > patch-4: handles surprise removal of virtio pci device which avoids > call trace and system lockup >Any comments to address or will you please take this short series? I am working on improving other part of the hotplug that you described to take care in near future. (next kernel cycle).
Hi Michael,> From: Parav Pandit > Sent: Thursday, July 29, 2021 10:09 AM > > Hi Michael, > > > From: Parav Pandit <parav at nvidia.com> > > Sent: Wednesday, July 21, 2021 7:57 PM > > To: mst at redhat.com; virtualization at lists.linux-foundation.org > > Cc: Parav Pandit <parav at nvidia.com> > > Subject: [PATCH v3 0/4] virtio short improvements > > > > Hi, > > > > This series contains small improvements for virtio pci driver. > > The main idea is to support surprise removal of virtio pci device when > > the driver is already loaded. Future patches will further improve > > other areas of hotplug. > > > > Patches 1 to 3 prepare the code to handle surprise removal by marking > > the device as broken in patch-4. > > > > Patch summary: > > patch-1: ensures that compiler optimization doesn't occur on vq->broken > > flag > > patch-2: maintains the mirror sequence on VQ delete and VQ create > > patch-3: protects vqs list for simultaneous access from reader and a > > writer > > patch-4: handles surprise removal of virtio pci device which avoids > > call trace and system lockup > > > Any comments to address or will you please take this short series? >Can we please proceed with these series?