Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 0/7] virtio: avoid various hang situations during hot-unplug
Hi, this patch-set tries to solve various hang situations when virtio devices (network or block) are hot-unplugged from a KVM guest. On System z there exists no handshake mechanism between host and guest when a device is hot-unplugged. The device is removed and no further I/O is possible. The guest is notified about the hard removal with a CRW machine check. As per architecture, the host must repond to any I/O operation for the removed device with an error condition as if the device had never been there. During machine check handling in the guest, virtio exit functions try to perform cleanup operations by triggering final I/O, including appropriate host kicks. These operations fail, or do not complete, and lead to several kinds of hang situations. In particular, virtio-ccw guest->host notification on an unplugged device will receive an error; this is, however, not reflected back to the affected virtqueues. Here are the details for some of the errors. In the network case a hang (loop) occurs when a machine check is handled on System z due to a vlan device removal. A loop spinning for a response for final IO in virtnet_send_command() will never complete successfully because of a previous unsuccessfull host kick operation (virtqueue_kick()). The below patches [1,2] flag the virtqueue as 'broken' when a host kick failure is detected. Patch [3] exploits this error info to avoid an endless invocation of cpu_relax() when waiting for the command to complete. Hang situations also occur when a block device is hot-unplugged. Several different errors occur when a block device with mounted file-system(s) is hot-unplugged. Asynchronous writeback functions, as well as page cache read or write operations end up in never ending wait situations. Hang situations occur during device removal when virtblk_remove() invokes del_gendisk() to synch dirty inode pages (invalidate_partition()). The below patches [4,5,6,7] also exploit a 'broken' virtqueue in order to trigger IO errors as well as to prevent final hanging IO operations. Heinz Graalfs (7): virtio_ring: add new functions virtqueue{_set_broken()/_is_broken()} s390/virtio_ccw: set virtqueue as broken if host notify failed virtio_net: avoid cpu_relax() call loop in case virtqueue is broken virtio_blk: use dummy virtqueue_notify() to detect host kick error virtio_blk: do not free device id if virtqueue is broken virtio_blk: set request queue as dying in case virtqueue is broken virtio_blk: trigger IO errors in case virtqueue is broken drivers/block/virtio_blk.c | 41 ++++++++++++++++++++++++++++++++++++----- drivers/net/virtio_net.c | 4 +++- drivers/s390/kvm/virtio_ccw.c | 2 ++ drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ include/linux/virtio.h | 4 ++++ 5 files changed, 61 insertions(+), 6 deletions(-) -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 1/7] virtio_ring: add new functions virtqueue{_set_broken()/_is_broken()}
This patch adds 2 new functions: virtqueue_set_broken(): to be called when a virtqueue kick operation fails. virtqueue_is_broken(): can be called to query the virtqueue state after a host was kicked. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/virtio/virtio_ring.c | 16 ++++++++++++++++ include/linux/virtio.h | 4 ++++ 2 files changed, 20 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5217baf..930ee39 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -836,4 +836,20 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq) } EXPORT_SYMBOL_GPL(virtqueue_get_vring_size); +void virtqueue_set_broken(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + vq->broken = true; +} +EXPORT_SYMBOL_GPL(virtqueue_set_broken); + +bool virtqueue_is_broken(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + return vq->broken; +} +EXPORT_SYMBOL_GPL(virtqueue_is_broken); + MODULE_LICENSE("GPL"); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 9ff8645..c09ae2b 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -76,6 +76,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq); unsigned int virtqueue_get_vring_size(struct virtqueue *vq); +void virtqueue_set_broken(struct virtqueue *vq); + +bool virtqueue_is_broken(struct virtqueue *vq); + /** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 2/7] s390/virtio_ccw: set virtqueue as broken if host notify failed
Set the current virtqueue as broken if the appropriate host kick failed (e.g. if the device was hot-unplugged via host means). This error info can be exploited at various other places where a host kick is triggered. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/s390/kvm/virtio_ccw.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 6a410d4..ac0ace6 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -384,6 +384,8 @@ static void virtio_ccw_kvm_notify(struct virtqueue *vq) vcdev = to_vc_device(info->vq->vdev); ccw_device_get_schid(vcdev->cdev, &schid); info->cookie = do_kvm_notify(schid, vq->index, info->cookie); + if (info->cookie < 0) + virtqueue_set_broken(vq); } static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 3/7] virtio_net: avoid cpu_relax() call loop in case virtqueue is broken
A virtqueue_kick() call to notify a host might fail in case the network device was hot-unplugged. Spinning for a response for a VIRTIO_NET_CTRL_VLAN_DEL command response will end up in a never ending loop waiting for a response. This patch avoids the cpu_relax() loop in case the virtqueue is flagged as broken. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ab2e5d0..57f5f13 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -800,8 +800,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, /* Spin for a response, the kick causes an ioport write, trapping * into the hypervisor, so the request should be handled immediately. + * Do not wait for a response in case the virtqueue is 'broken'. */ - while (!virtqueue_get_buf(vi->cvq, &tmp)) + while (!virtqueue_get_buf(vi->cvq, &tmp) + && !virtqueue_is_broken(vi->cvq)) cpu_relax(); return status == VIRTIO_NET_OK; -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 4/7] virtio_blk: use dummy virtqueue_notify() to detect host kick error
Deleting the disk and partitions in virtblk_remove() via del_gendisk() causes never ending waits when trying to synch dirty inode pages. A dummy virtqueue_notify() in virtblk_remove() is used to detect a host notification error, latter occurs when block device was hot-unplugged. When the dummy host kick failed blk_cleanup_queue() should be invoked prior to del_gendisk(). Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/block/virtio_blk.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 6472395..98f081a 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -880,8 +880,14 @@ static void virtblk_remove(struct virtio_device *vdev) vblk->config_enable = false; mutex_unlock(&vblk->config_lock); - del_gendisk(vblk->disk); - blk_cleanup_queue(vblk->disk->queue); + virtqueue_notify(vblk->vq); + if (virtqueue_is_broken(vblk->vq)) { + blk_cleanup_queue(vblk->disk->queue); + del_gendisk(vblk->disk); + } else { + del_gendisk(vblk->disk); + blk_cleanup_queue(vblk->disk->queue); + } /* Stop all the virtqueues. */ vdev->config->reset(vdev); -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 5/7] virtio_blk: do not free device id if virtqueue is broken
Re-using the same device id when a device is re-added after it was previously hot-unplugged should be avoided. An additional test is added to check a potential broken virtqueue when freeing the device id. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/block/virtio_blk.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 98f081a..a787e6e 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -874,6 +874,7 @@ static void virtblk_remove(struct virtio_device *vdev) struct virtio_blk *vblk = vdev->priv; int index = vblk->index; int refc; + bool queue_broken = false; /* Prevent config work handler from accessing the device. */ mutex_lock(&vblk->config_lock); @@ -882,6 +883,7 @@ static void virtblk_remove(struct virtio_device *vdev) virtqueue_notify(vblk->vq); if (virtqueue_is_broken(vblk->vq)) { + queue_broken = true; blk_cleanup_queue(vblk->disk->queue); del_gendisk(vblk->disk); } else { @@ -900,8 +902,10 @@ static void virtblk_remove(struct virtio_device *vdev) vdev->config->del_vqs(vdev); kfree(vblk); - /* Only free device id if we don't have any users */ - if (refc == 1) + /* Only free device id if we don't have any users + * and virtqueue is not broken due to a hot-unplugged device + */ + if (refc == 1 && !queue_broken) ida_simple_remove(&vd_index_ida, index); } -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 6/7] virtio_blk: set request queue as dying in case virtqueue is broken
The request queue should be flagged as QUEUE_FLAG_DYING in case the host kick failed for a new virtqueue request. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/block/virtio_blk.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index a787e6e..01b5d3a 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -377,8 +377,14 @@ static void virtblk_request(struct request_queue *q) issued++; } - if (issued) + if (issued) { virtqueue_kick(vblk->vq); + if (virtqueue_is_broken(vblk->vq)) { + mutex_lock(&q->sysfs_lock); + queue_flag_set_unlocked(QUEUE_FLAG_DYING, q); + mutex_unlock(&q->sysfs_lock); + } + } } static void virtblk_make_request(struct request_queue *q, struct bio *bio) -- 1.8.3.1
Heinz Graalfs
2013-Oct-22 12:45 UTC
[PATCH RFC 7/7] virtio_blk: trigger IO errors in case virtqueue is broken
In case the virtqueue is flagged as broken, IO errors are triggered for current request queue entries. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/block/virtio_blk.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 01b5d3a..8eb91be 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -875,6 +875,20 @@ out: return err; } +static void virtblk_flush_request_queue(struct request_queue *q) +{ + spinlock_t *lock = q->queue_lock; + struct request *req; + + if (!q) + return; + + spin_lock_irq(lock); + while ((req = blk_fetch_request(q))) + __blk_end_request_all(req, -EIO); + spin_unlock_irq(lock); +} + static void virtblk_remove(struct virtio_device *vdev) { struct virtio_blk *vblk = vdev->priv; @@ -890,6 +904,7 @@ static void virtblk_remove(struct virtio_device *vdev) virtqueue_notify(vblk->vq); if (virtqueue_is_broken(vblk->vq)) { queue_broken = true; + virtblk_flush_request_queue(vblk->disk->queue); blk_cleanup_queue(vblk->disk->queue); del_gendisk(vblk->disk); } else { -- 1.8.3.1
Rusty Russell
2013-Oct-23 00:02 UTC
[PATCH RFC 4/7] virtio_blk: use dummy virtqueue_notify() to detect host kick error
Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:> Deleting the disk and partitions in virtblk_remove() via del_gendisk() causes > never ending waits when trying to synch dirty inode pages. > > A dummy virtqueue_notify() in virtblk_remove() is used to detect a host > notification error, latter occurs when block device was hot-unplugged. > When the dummy host kick failed blk_cleanup_queue() should be invoked > prior to del_gendisk(). > > Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> > --- > drivers/block/virtio_blk.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 6472395..98f081a 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -880,8 +880,14 @@ static void virtblk_remove(struct virtio_device *vdev) > vblk->config_enable = false; > mutex_unlock(&vblk->config_lock); > > - del_gendisk(vblk->disk); > - blk_cleanup_queue(vblk->disk->queue); > + virtqueue_notify(vblk->vq); > + if (virtqueue_is_broken(vblk->vq)) { > + blk_cleanup_queue(vblk->disk->queue); > + del_gendisk(vblk->disk); > + } else { > + del_gendisk(vblk->disk); > + blk_cleanup_queue(vblk->disk->queue); > + }This seems horribly wrong. Firstly, it's ugly to have a gratuitous kick. Secondly, it's racy: what if there's a hot unplug (or other failure) after your virtqueue_is_broken() test? We should be doing I/O failures which should be handled by del_gendisk() correctly. Cheers, Rusty.
Rusty Russell
2013-Oct-23 00:06 UTC
[PATCH RFC 1/7] virtio_ring: add new functions virtqueue{_set_broken()/_is_broken()}
Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:> This patch adds 2 new functions: > > virtqueue_set_broken(): to be called when a virtqueue kick operation fails. > > virtqueue_is_broken(): can be called to query the virtqueue state after a host > was kicked. > > Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>Thanks for doing this! But as shown by the following patches, the separation of kick and broken test is a bad API. We should make virtqueue_kick() and virtqueue_notify() return a bool (ie. vq->broken). We'll still need virtqueue_is_broken(), as it would be nice to make all callers to virtqueue_get_buf() check it as well. Cheers, Rusty.
Reasonably Related Threads
- [PATCH RFC 0/7] virtio: avoid various hang situations during hot-unplug
- [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device
- [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device
- [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
- [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss