Heinz Graalfs
2013-Nov-27  10:32 UTC
[PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device
Hi, here is an updated patch-set to my v2 RFC virtio: add new notify() callback to virtio_driver This RFC introduces a new virtio_device entry 'surprize_removal' instead of a new 'notify' callback in struct virtio_driver. When an active virtio block device is hot-unplugged from a KVM guest, affected guest user applications are not aware of any errors that occur due to the lost device. This patch-set adds code to avoid further request queueing when a lost block device is detected, resulting in appropriate error info. Additionally a potential hang situation can be avoided by not waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that will never complete. On System z there exists no handshake mechanism between host and guest when a device is hot-unplugged. The device is removed and no further I/O is possible. When an online channel device disappears on System z the kernel's CIO layer informs the driver (virtio_ccw) about the lost device. Here are some more error details: For a particular block device virtio's request function virtblk_request() is called by the block layer to queue requests to be handled by the host. In case of a lost device requests can still be queued, but an appropriate subsequent host kick usually fails. This leads to situations where no error feedback is shown. In order to prevent request queueing for lost devices appropriate settings in the block layer should be made. Exploiting System z's CIO notify handler callback, and passing on device loss information via the surprize_removal flag to the remove callback of the backend driver, can solve this task. v2->v3 changes: - remove virtio_driver's notify callback (and appropriate code) introduced in my v1 RFC - introduce 'surprize_removal' in struct virtio_device - change virtio_blk's remove callback to perform special actions when the surprize_removal flag is set - avoid final I/O by preventing further request queueing - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests - set surprize_removal in virtio_ccw's notify callback when a device is lost v1->v2 changes: - add include of linux/notifier.h (I also added it to the 3rd patch) - get queue lock in order to be able to use safe queue_flag_set() functions in virtblk_notify() handler Heinz Graalfs (4): virtio: add surprize_removal to virtio_device virtio_blk: avoid further request queueing on device loss virtio_blk: avoid calling blk_cleanup_queue() on device loss virtio_ccw: set surprize_removal in virtio_device if a device was lost drivers/block/virtio_blk.c | 12 +++++++++++- drivers/s390/kvm/virtio_ccw.c | 16 ++++++++++++++-- drivers/virtio/virtio.c | 1 + include/linux/virtio.h | 1 + 4 files changed, 27 insertions(+), 3 deletions(-) -- 1.8.3.1
Heinz Graalfs
2013-Nov-27  10:32 UTC
[PATCH v3 RFC 1/4] virtio: add surprize_removal to virtio_device
Add new entry surprize_removal to struct virtio_device.
When a virtio transport driver is notified about a lost device
it should set surprize_removal to true.
A backend driver can test this flag in order to perform specific
actions that might be appropriate wrt the device loss.
Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>
---
 drivers/virtio/virtio.c | 1 +
 include/linux/virtio.h  | 1 +
 2 files changed, 2 insertions(+)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..290d1e2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -115,6 +115,7 @@ static int virtio_dev_probe(struct device *_d)
 
 	/* We have a driver! */
 	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+	dev->surprize_removal = false;
 
 	/* Figure out what features the device supports. */
 	device_features = dev->config->get_features(dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..131404a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -98,6 +98,7 @@ struct virtio_device {
 	struct list_head vqs;
 	/* Note that this is a Linux set_bit-style bitmap. */
 	unsigned long features[1];
+	bool surprize_removal;
 	void *priv;
 };
 
-- 
1.8.3.1
Heinz Graalfs
2013-Nov-27  10:32 UTC
[PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss
Code is added to the remove callback to verify if a device was lost.
In case of a device loss further request queueing should be prevented
by setting appropriate queue flags prior to invoking del_gendisk().
Blocking of request queueing leads to appropriate I/O errors when data
are tried to be synched. Trying to synch data to a lost block device
doesn't make too much sense.
If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), del_gendisk() would be triggered
resulting in a hang. This is a weird situation, and most likely not
'serializable'.
Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>
---
 drivers/block/virtio_blk.c | 9 +++++++++
 1 file changed, 9 insertions(+)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..0f64282 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -875,6 +875,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk = vdev->priv;
 	int index = vblk->index;
+	unsigned long flags;
 	int refc;
 
 	/* Prevent config work handler from accessing the device. */
@@ -882,6 +883,14 @@ static void virtblk_remove(struct virtio_device *vdev)
 	vblk->config_enable = false;
 	mutex_unlock(&vblk->config_lock);
 
+	if (vdev->surprize_removal) {
+		spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
+		queue_flag_set(QUEUE_FLAG_DYING, vblk->disk->queue);
+		queue_flag_set(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+		queue_flag_set(QUEUE_FLAG_NOXMERGES, vblk->disk->queue);
+		spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
+	}
+
 	del_gendisk(vblk->disk);
 	blk_cleanup_queue(vblk->disk->queue);
 
-- 
1.8.3.1
Heinz Graalfs
2013-Nov-27  10:32 UTC
[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
Code is added to avoid calling blk_cleanup_queue() when the surprize_removal flag is set due to a disappeared device. It avoid hangs due to incomplete requests (e.g. in-flight requests). Such requests must be considered as lost. If the current remove callback was triggered due to an unregister driver, and the surprize_removal is not already set (although the actual device is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests that will never complete. This is a weird situation, and most likely not 'serializable'. Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> --- drivers/block/virtio_blk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 0f64282..8c05001 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev) } del_gendisk(vblk->disk); - blk_cleanup_queue(vblk->disk->queue); + if (!vdev->surprize_removal) + blk_cleanup_queue(vblk->disk->queue); /* Stop all the virtqueues. */ vdev->config->reset(vdev); -- 1.8.3.1
Heinz Graalfs
2013-Nov-27  10:32 UTC
[PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost
Code is added to the notify handler to set the 'surprize_removal' flag
in virtio_device in case a CIO_GONE notification occurs. The remove
callback of the backend driver must check this flag in order to perform
special processing for a lost device.
Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>
---
 drivers/s390/kvm/virtio_ccw.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..8f6c74a 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/io.h>
 #include <linux/kvm_para.h>
+#include <linux/notifier.h>
 #include <asm/setup.h>
 #include <asm/irq.h>
 #include <asm/cio.h>
@@ -1064,8 +1065,19 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-	/* TODO: Check whether we need special handling here. */
-	return 0;
+	int rc;
+	struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+	switch (event) {
+	case CIO_GONE:
+		vcdev->vdev.surprize_removal = true;
+		rc = NOTIFY_DONE;
+		break;
+	default:
+		rc = NOTIFY_DONE;
+		break;
+	}
+	return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1
Michael S. Tsirkin
2013-Nov-27  10:47 UTC
[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:> Code is added to avoid calling blk_cleanup_queue() when the surprize_removal > flag is set due to a disappeared device. It avoid hangs due to incomplete > requests (e.g. in-flight requests). Such requests must be considered as lost.Ugh. Can't we complete these immediately using detach_unused_buf? If not why?> If the current remove callback was triggered due to an unregister driver, > and the surprize_removal is not already set (although the actual device > is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered > resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests > that will never complete. This is a weird situation, and most likely not > 'serializable'.Hmm interesting. Implement some timeout and probe device to make sure it's still alive?> Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> > --- > drivers/block/virtio_blk.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 0f64282..8c05001 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev) > } > > del_gendisk(vblk->disk); > - blk_cleanup_queue(vblk->disk->queue); > + if (!vdev->surprize_removal) > + blk_cleanup_queue(vblk->disk->queue); > > /* Stop all the virtqueues. */ > vdev->config->reset(vdev); > -- > 1.8.3.1
Michael S. Tsirkin
2013-Nov-27  10:49 UTC
[PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost
On Wed, Nov 27, 2013 at 11:32:40AM +0100, Heinz Graalfs wrote:> Code is added to the notify handler to set the 'surprize_removal' flag > in virtio_device in case a CIO_GONE notification occurs. The remove > callback of the backend driver must check this flag in order to perform > special processing for a lost device. > > Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> > --- > drivers/s390/kvm/virtio_ccw.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c > index 35b9aaa..8f6c74a 100644 > --- a/drivers/s390/kvm/virtio_ccw.c > +++ b/drivers/s390/kvm/virtio_ccw.c > @@ -27,6 +27,7 @@ > #include <linux/module.h> > #include <linux/io.h> > #include <linux/kvm_para.h> > +#include <linux/notifier.h> > #include <asm/setup.h> > #include <asm/irq.h> > #include <asm/cio.h> > @@ -1064,8 +1065,19 @@ out_free: > > static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event) > { > - /* TODO: Check whether we need special handling here. */ > - return 0; > + int rc; > + struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev); > + > + switch (event) { > + case CIO_GONE: > + vcdev->vdev.surprize_removal = true; > + rc = NOTIFY_DONE; > + break; > + default: > + rc = NOTIFY_DONE; > + break; > + } > + return rc; > } >So again, this seems to mean there is some serialization with device unregistration? Otherwise how do we know using dev_get_drvdata is safe?> static struct ccw_device_id virtio_ids[] = { > -- > 1.8.3.1
Rusty Russell
2013-Dec-04  04:04 UTC
[PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss
Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:> Code is added to the remove callback to verify if a device was lost. > > In case of a device loss further request queueing should be prevented > by setting appropriate queue flags prior to invoking del_gendisk(). > Blocking of request queueing leads to appropriate I/O errors when data > are tried to be synched. Trying to synch data to a lost block device > doesn't make too much sense.Sure, but this isn't the only case where we should set these flags, right? I would think if any virtqueue is reported broken, we should mark the queue dying too.> If the current remove callback was triggered due to an unregister driver, > and the surprize_removal is not already set (although the actual device > is already gone, e.g. virsh detach), del_gendisk() would be triggered > resulting in a hang. This is a weird situation, and most likely not > 'serializable'.This seems like abusing the vdev struct to pass an argument to virtblk_remove(). How about you mark all virtqueues broken in the "unexpected removal" case? Then the driver should correctly fail to deliver requests. Cheers, Rusty.
Seemingly Similar Threads
- [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device