Heinz Graalfs
2013-Dec-13  13:13 UTC
[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th. When an active virtio block device is hot-unplugged from a KVM guest, affected guest user applications are not aware of any errors that occur due to the lost device. This patch-set adds code to avoid further request queueing when a lost block device is detected, resulting in appropriate error info. Additionally a potential hang situation can be avoided by not waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that will never complete. On System z there exists no handshake mechanism between host and guest when a device is hot-unplugged. The device is removed and no further I/O is possible. When an online channel device disappears on System z the kernel's CIO layer informs the driver (virtio_ccw) about the lost device. Here are some more error details: For a particular block device virtio's request function virtblk_request() is called by the block layer to queue requests to be handled by the host. In case of a lost device requests can still be queued, but an appropriate subsequent host kick usually fails. This leads to situations where no error feedback is shown. In order to prevent request queueing for lost devices appropriate settings in the block layer should be made. Exploiting System z's CIO notify handler callback, and passing on device loss information via the surprize_removal flag to the remove callback of the backend driver, can solve this task. v3->v4 changes: - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver (e.g. locked vcdev pointer reset/query; serialize remove()/set_offline() callback processing). - patch 2: introduces 'device_lost' atomic in virtio_device and use in backend driver virtio_blk accordingly (original 3 patches merged). - patch 3: the notify() callback is now serialized with remove()/set_offline() callbacks. The notification is ignored if the vcdev pointer has been cleared already (by remove() or set_offline()). v2->v3 changes: - remove virtio_driver's notify callback (and appropriate code) introduced in my v1 RFC - introduce 'surprize_removal' in struct virtio_device - change virtio_blk's remove callback to perform special actions when the surprize_removal flag is set - avoid final I/O by preventing further request queueing - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests - set surprize_removal in virtio_ccw's notify callback when a device is lost v1->v2 changes: - add include of linux/notifier.h (I also added it to the 3rd patch) - get queue lock in order to be able to use safe queue_flag_set() functions in virtblk_notify() handler Heinz Graalfs (3): virtio_ccw: fix vcdev pointer handling issues virtio: introduce 'device_lost' flag in virtio_device virtio_ccw: set 'device_lost' on CIO_GONE notification drivers/block/virtio_blk.c | 14 ++++++++++- drivers/s390/kvm/virtio_ccw.c | 58 ++++++++++++++++++++++++++++++++++++------- include/linux/virtio.h | 2 ++ 3 files changed, 64 insertions(+), 10 deletions(-) -- 1.8.3.1
Heinz Graalfs
2013-Dec-13  13:13 UTC
[PATCH v4 RFC 1/3] virtio_ccw: fix vcdev pointer handling issues
The interrupt handler virtio_ccw_int_handler() using the vcdev pointer
is protected by the ccw_device lock. Resetting the pointer within the
ccw_device structure should be done when holding this lock.
Also resetting the vcdev pointer (under the ccw_device lock) prior to
freeing the vcdev pointer memory removes a critical path.
Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck at de.ibm.com>
---
 drivers/s390/kvm/virtio_ccw.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..b939a7f 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -886,6 +886,8 @@ static void virtio_ccw_int_handler(struct ccw_device *cdev,
 	struct virtqueue *vq;
 	struct virtio_driver *drv;
 
+	if (!vcdev)
+		return;
 	/* Check if it's a notification from the host. */
 	if ((intparm == 0) &&
 	    (scsw_stctl(&irb->scsw) =@@ -985,23 +987,37 @@ static int
virtio_ccw_probe(struct ccw_device *cdev)
 	return 0;
 }
 
+static struct virtio_ccw_device *virtio_grab_drvdata(struct ccw_device *cdev)
+{
+	unsigned long flags;
+	struct virtio_ccw_device *vcdev;
+
+	spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
+	vcdev = dev_get_drvdata(&cdev->dev);
+	if (!vcdev) {
+		spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
+		return NULL;
+	}
+	dev_set_drvdata(&cdev->dev, NULL);
+	spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
+	return vcdev;
+}
+
 static void virtio_ccw_remove(struct ccw_device *cdev)
 {
-	struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+	struct virtio_ccw_device *vcdev = virtio_grab_drvdata(cdev);
 
-	if (cdev->online) {
+	if (vcdev && cdev->online)
 		unregister_virtio_device(&vcdev->vdev);
-		dev_set_drvdata(&cdev->dev, NULL);
-	}
 	cdev->handler = NULL;
 }
 
 static int virtio_ccw_offline(struct ccw_device *cdev)
 {
-	struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+	struct virtio_ccw_device *vcdev = virtio_grab_drvdata(cdev);
 
-	unregister_virtio_device(&vcdev->vdev);
-	dev_set_drvdata(&cdev->dev, NULL);
+	if (vcdev)
+		unregister_virtio_device(&vcdev->vdev);
 	return 0;
 }
 
@@ -1010,6 +1026,7 @@ static int virtio_ccw_online(struct ccw_device *cdev)
 {
 	int ret;
 	struct virtio_ccw_device *vcdev;
+	unsigned long flags;
 
 	vcdev = kzalloc(sizeof(*vcdev), GFP_KERNEL);
 	if (!vcdev) {
@@ -1039,7 +1056,9 @@ static int virtio_ccw_online(struct ccw_device *cdev)
 	INIT_LIST_HEAD(&vcdev->virtqueues);
 	spin_lock_init(&vcdev->lock);
 
+	spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
 	dev_set_drvdata(&cdev->dev, vcdev);
+	spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
 	vcdev->vdev.id.vendor = cdev->id.cu_type;
 	vcdev->vdev.id.device = cdev->id.cu_model;
 	ret = register_virtio_device(&vcdev->vdev);
@@ -1050,7 +1069,9 @@ static int virtio_ccw_online(struct ccw_device *cdev)
 	}
 	return 0;
 out_put:
+	spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
 	dev_set_drvdata(&cdev->dev, NULL);
+	spin_unlock_irqrestore(get_ccwdev_lock(cdev), flags);
 	put_device(&vcdev->vdev.dev);
 	return ret;
 out_free:
-- 
1.8.3.1
Heinz Graalfs
2013-Dec-13  13:13 UTC
[PATCH v4 RFC 2/3] virtio: introduce 'device_lost' flag in virtio_device
This flag should be set by a virtio transport driver, when it was
notified about a lost device, before the remove callback of a
backend driver is triggered.
A backend driver can test this flag in order to perform specific
actions that might be appropriate wrt the device loss.
In case of a device loss further request queueing should be prevented
by setting appropriate queue flags prior to invoking del_gendisk().
Blocking of request queueing leads to appropriate I/O errors when data
are tried to be synched. Trying to synch data to a lost block device
doesn't make too much sense.
Calling blk_cleanup_queue() when the device_lost flag is set due to a
disappeared device. It avoid hangs due to incomplete requests
(e.g. in-flight requests). Such requests must be considered as lost.
Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck at de.ibm.com>
---
 drivers/block/virtio_blk.c | 14 +++++++++++++-
 include/linux/virtio.h     |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..e5b4947 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -876,14 +876,26 @@ static void virtblk_remove(struct virtio_device *vdev)
 	struct virtio_blk *vblk = vdev->priv;
 	int index = vblk->index;
 	int refc;
+	int device_lost;
+	unsigned long flags;
 
 	/* Prevent config work handler from accessing the device. */
 	mutex_lock(&vblk->config_lock);
 	vblk->config_enable = false;
 	mutex_unlock(&vblk->config_lock);
 
+	device_lost = atomic_read(&vdev->device_lost);
+	if (device_lost) {
+		spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
+		queue_flag_set(QUEUE_FLAG_DYING, vblk->disk->queue);
+		queue_flag_set(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+		queue_flag_set(QUEUE_FLAG_NOXMERGES, vblk->disk->queue);
+		spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
+	}
+
 	del_gendisk(vblk->disk);
-	blk_cleanup_queue(vblk->disk->queue);
+	if (!device_lost)
+		blk_cleanup_queue(vblk->disk->queue);
 
 	/* Stop all the virtqueues. */
 	vdev->config->reset(vdev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..c18db21 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -87,6 +87,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
  * @vringh_config: configuration ops for host vrings.
  * @vqs: the list of virtqueues for this device.
  * @features: the features supported by both driver and device.
+ * @device_lost: to flag a device loss.
  * @priv: private pointer for the driver's use.
  */
 struct virtio_device {
@@ -98,6 +99,7 @@ struct virtio_device {
 	struct list_head vqs;
 	/* Note that this is a Linux set_bit-style bitmap. */
 	unsigned long features[1];
+	atomic_t device_lost;
 	void *priv;
 };
 
-- 
1.8.3.1
Heinz Graalfs
2013-Dec-13  13:13 UTC
[PATCH v4 RFC 3/3] virtio_ccw: set 'device_lost' on CIO_GONE notification
When a CIO_GONE notification is received the device_lost flag is
set in the virtio_device. This flag should be tested by a backend
in order to be able to prevent triggering final I/O to a device that
is not reachable any more.
The notification is ignored in case remove or set_offline is already
running. The virtio_device pointer might point to freed memory in that
case.
Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck at de.ibm.com>
---
 drivers/s390/kvm/virtio_ccw.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index b939a7f..a468b9c 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/io.h>
 #include <linux/kvm_para.h>
+#include <linux/notifier.h>
 #include <asm/setup.h>
 #include <asm/irq.h>
 #include <asm/cio.h>
@@ -1085,8 +1086,26 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-	/* TODO: Check whether we need special handling here. */
-	return 0;
+	int rc;
+	struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+	/*
+	 * Make sure vcdev is set
+	 * i.e. set_offline/remove callback not already running
+	 */
+	if (!vcdev)
+		return NOTIFY_DONE;
+
+	switch (event) {
+	case CIO_GONE:
+		atomic_inc(&vcdev->vdev.device_lost);
+		rc = NOTIFY_DONE;
+		break;
+	default:
+		rc = NOTIFY_DONE;
+		break;
+	}
+	return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1
Rusty Russell
2013-Dec-17  03:42 UTC
[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:> Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th. > > When an active virtio block device is hot-unplugged from a KVM guest, > affected guest user applications are not aware of any errors that occur > due to the lost device. This patch-set adds code to avoid further request > queueing when a lost block device is detected, resulting in appropriate > error info. Additionally a potential hang situation can be avoided by not > waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that > will never complete. > > On System z there exists no handshake mechanism between host and guest > when a device is hot-unplugged. The device is removed and no further I/O > is possible.Hi Heinz, If you simply mark every virtqueue as broken when this unexpected unplug happens, does that not Just Work? I think I've asked this before... Rusty.> > When an online channel device disappears on System z the kernel's CIO layer > informs the driver (virtio_ccw) about the lost device. > > Here are some more error details: > > For a particular block device virtio's request function virtblk_request() > is called by the block layer to queue requests to be handled by the host. > In case of a lost device requests can still be queued, but an appropriate > subsequent host kick usually fails. This leads to situations where no error > feedback is shown. > > In order to prevent request queueing for lost devices appropriate settings > in the block layer should be made. Exploiting System z's CIO notify handler > callback, and passing on device loss information via the surprize_removal > flag to the remove callback of the backend driver, can solve this task. > > v3->v4 changes: > - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver > (e.g. locked vcdev pointer reset/query; serialize remove()/set_offline() > callback processing). > - patch 2: introduces 'device_lost' atomic in virtio_device and use in > backend driver virtio_blk accordingly (original 3 patches merged). > - patch 3: the notify() callback is now serialized with remove()/set_offline() > callbacks. The notification is ignored if the vcdev pointer has been cleared > already (by remove() or set_offline()). > > v2->v3 changes: > - remove virtio_driver's notify callback (and appropriate code) introduced > in my v1 RFC > - introduce 'surprize_removal' in struct virtio_device > - change virtio_blk's remove callback to perform special actions when the > surprize_removal flag is set > - avoid final I/O by preventing further request queueing > - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests > - set surprize_removal in virtio_ccw's notify callback when a device is lost > > v1->v2 changes: > - add include of linux/notifier.h (I also added it to the 3rd patch) > - get queue lock in order to be able to use safe queue_flag_set() functions > in virtblk_notify() handler > > > Heinz Graalfs (3): > virtio_ccw: fix vcdev pointer handling issues > virtio: introduce 'device_lost' flag in virtio_device > virtio_ccw: set 'device_lost' on CIO_GONE notification > > drivers/block/virtio_blk.c | 14 ++++++++++- > drivers/s390/kvm/virtio_ccw.c | 58 ++++++++++++++++++++++++++++++++++++------- > include/linux/virtio.h | 2 ++ > 3 files changed, 64 insertions(+), 10 deletions(-) > > -- > 1.8.3.1
Heinz Graalfs
2013-Dec-17  14:01 UTC
[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
On 17/12/13 04:42, Rusty Russell wrote:> Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes: >> Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th. >> >> When an active virtio block device is hot-unplugged from a KVM guest, >> affected guest user applications are not aware of any errors that occur >> due to the lost device. This patch-set adds code to avoid further request >> queueing when a lost block device is detected, resulting in appropriate >> error info. Additionally a potential hang situation can be avoided by not >> waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that >> will never complete. >> >> On System z there exists no handshake mechanism between host and guest >> when a device is hot-unplugged. The device is removed and no further I/O >> is possible. > > > Hi Heinz, > > If you simply mark every virtqueue as broken when this > unexpected unplug happens, does that not Just Work? > > I think I've asked this before... > Rusty.Hi Rusty, setting the (one) virtqueue, vblk is currently using, as broken doesn't solve the problems. In that case virtblk_request()s still succeed - like this one... ([<0000000000112b28>] show_trace+0xf8/0x154) [<0000000000112bde>] show_stack+0x5a/0xdc [<000000000045eb56>] virtblk_request+0x25a/0x2b8 [<00000000003e749c>] __blk_run_queue+0x50/0x64 [<00000000003edb54>] blk_queue_bio+0x358/0x3f0 [<00000000003eb446>] generic_make_request+0xea/0x130 [<00000000003eb536>] submit_bio+0xaa/0x1a8 [<00000000002c95e8>] _submit_bh+0x1c4/0x2f4 [<00000000003a25e4>] journal_write_superblock+0xa0/0x1fc [<00000000003a3ed4>] journal_update_sb_log_tail+0x48/0x7c [<000000000039e742>] journal_commit_transaction+0x1586/0x1aa0 [<00000000003a2a0e>] kjournald+0xfe/0x2a0 [<00000000001786fc>] kthread+0xd8/0xe0 [<0000000000698fee>] kernel_thread_starter+0x6/0xc 2 locks held by kjournald/1984: ... and end up in hang situations ... PID: 13 TASK: 1e3f8000 CPU: 0 COMMAND: "kworker/u128:1" #0 [1e2033e0] __schedule at 695ff2 #1 [1e203530] log_wait_commit at 3a28a6 #2 [1e2035a0] ext3_sync_fs at 328dea #3 [1e2035d8] sync_filesystem at 2c785c #4 [1e203600] fsync_bdev at 2d4650 #5 [1e203628] invalidate_partition at 3f80c8 #6 [1e203650] del_gendisk at 3f8f5c #7 [1e2036c8] virtblk_remove at 45e60e #8 [1e203700] virtio_dev_remove at 42d72e #9 [1e203738] __device_release_driver at 44f0b0 #10 [1e203760] device_release_driver at 44f16c #11 [1e203788] bus_remove_device at 44ea92 #12 [1e2037b8] device_del at 44bb40 #13 [1e2037f0] device_unregister at 44bbfa #14 [1e203810] unregister_virtio_device at 42d9e6 #15 [1e203830] virtio_ccw_remove at 53b834 #16 [1e203850] ccw_device_remove at 4c5bf6 #17 [1e2038d8] __device_release_driver at 44f0b0 #18 [1e203900] device_release_driver at 44f16c #19 [1e203928] bus_remove_device at 44ea92 #20 [1e203958] device_del at 44bb40 #21 [1e203990] ccw_device_unregister at 4c645c #22 [1e2039b0] io_subchannel_remove at 4c6b1a #23 [1e2039e8] css_remove at 4c054e #24 [1e203a08] __device_release_driver at 44f0b0 #25 [1e203a30] device_release_driver at 44f16c #26 [1e203a58] bus_remove_device at 44ea92 #27 [1e203a88] device_del at 44bb40 #28 [1e203ac0] device_unregister at 44bbfa #29 [1e203ae0] css_sch_device_unregister at 4c06cc #30 [1e203b08] io_subchannel_sch_event at 4c8c3a #31 [1e203b80] css_evaluate_known_subchannel at 4c09bc #32 [1e203be0] slow_eval_known_fn at 4c19a6 #33 [1e203c10] bus_for_each_dev at 44d50e #34 [1e203c50] for_each_subchannel_staged at 4c1066 #35 [1e203c98] css_slow_path_func at 4c1124 #36 [1e203cc0] process_one_work at 16c7f6 #37 [1e203d60] worker_thread at 16dce4 #38 [1e203da8] kthread at 1786fc #39 [1e203eb0] kernel_thread_starter at 698fee Heinz> >> >> When an online channel device disappears on System z the kernel's CIO layer >> informs the driver (virtio_ccw) about the lost device. >> >> Here are some more error details: >> >> For a particular block device virtio's request function virtblk_request() >> is called by the block layer to queue requests to be handled by the host. >> In case of a lost device requests can still be queued, but an appropriate >> subsequent host kick usually fails. This leads to situations where no error >> feedback is shown. >> >> In order to prevent request queueing for lost devices appropriate settings >> in the block layer should be made. Exploiting System z's CIO notify handler >> callback, and passing on device loss information via the surprize_removal >> flag to the remove callback of the backend driver, can solve this task. >> >> v3->v4 changes: >> - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver >> (e.g. locked vcdev pointer reset/query; serialize remove()/set_offline() >> callback processing). >> - patch 2: introduces 'device_lost' atomic in virtio_device and use in >> backend driver virtio_blk accordingly (original 3 patches merged). >> - patch 3: the notify() callback is now serialized with remove()/set_offline() >> callbacks. The notification is ignored if the vcdev pointer has been cleared >> already (by remove() or set_offline()). >> >> v2->v3 changes: >> - remove virtio_driver's notify callback (and appropriate code) introduced >> in my v1 RFC >> - introduce 'surprize_removal' in struct virtio_device >> - change virtio_blk's remove callback to perform special actions when the >> surprize_removal flag is set >> - avoid final I/O by preventing further request queueing >> - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests >> - set surprize_removal in virtio_ccw's notify callback when a device is lost >> >> v1->v2 changes: >> - add include of linux/notifier.h (I also added it to the 3rd patch) >> - get queue lock in order to be able to use safe queue_flag_set() functions >> in virtblk_notify() handler >> >> >> Heinz Graalfs (3): >> virtio_ccw: fix vcdev pointer handling issues >> virtio: introduce 'device_lost' flag in virtio_device >> virtio_ccw: set 'device_lost' on CIO_GONE notification >> >> drivers/block/virtio_blk.c | 14 ++++++++++- >> drivers/s390/kvm/virtio_ccw.c | 58 ++++++++++++++++++++++++++++++++++++------- >> include/linux/virtio.h | 2 ++ >> 3 files changed, 64 insertions(+), 10 deletions(-) >> >> -- >> 1.8.3.1 >
Rusty Russell
2014-Jan-23  04:51 UTC
[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:> Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.Hi Heinz, I didn't get a response on my 'break all the virtqueues' patch series. Could your System Z code work with this? Rusty.> > When an active virtio block device is hot-unplugged from a KVM guest, > affected guest user applications are not aware of any errors that occur > due to the lost device. This patch-set adds code to avoid further request > queueing when a lost block device is detected, resulting in appropriate > error info. Additionally a potential hang situation can be avoided by not > waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that > will never complete. > > On System z there exists no handshake mechanism between host and guest > when a device is hot-unplugged. The device is removed and no further I/O > is possible. > > When an online channel device disappears on System z the kernel's CIO layer > informs the driver (virtio_ccw) about the lost device. > > Here are some more error details: > > For a particular block device virtio's request function virtblk_request() > is called by the block layer to queue requests to be handled by the host. > In case of a lost device requests can still be queued, but an appropriate > subsequent host kick usually fails. This leads to situations where no error > feedback is shown. > > In order to prevent request queueing for lost devices appropriate settings > in the block layer should be made. Exploiting System z's CIO notify handler > callback, and passing on device loss information via the surprize_removal > flag to the remove callback of the backend driver, can solve this task. > > v3->v4 changes: > - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver > (e.g. locked vcdev pointer reset/query; serialize remove()/set_offline() > callback processing). > - patch 2: introduces 'device_lost' atomic in virtio_device and use in > backend driver virtio_blk accordingly (original 3 patches merged). > - patch 3: the notify() callback is now serialized with remove()/set_offline() > callbacks. The notification is ignored if the vcdev pointer has been cleared > already (by remove() or set_offline()). > > v2->v3 changes: > - remove virtio_driver's notify callback (and appropriate code) introduced > in my v1 RFC > - introduce 'surprize_removal' in struct virtio_device > - change virtio_blk's remove callback to perform special actions when the > surprize_removal flag is set > - avoid final I/O by preventing further request queueing > - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests > - set surprize_removal in virtio_ccw's notify callback when a device is lost > > v1->v2 changes: > - add include of linux/notifier.h (I also added it to the 3rd patch) > - get queue lock in order to be able to use safe queue_flag_set() functions > in virtblk_notify() handler > > > Heinz Graalfs (3): > virtio_ccw: fix vcdev pointer handling issues > virtio: introduce 'device_lost' flag in virtio_device > virtio_ccw: set 'device_lost' on CIO_GONE notification > > drivers/block/virtio_blk.c | 14 ++++++++++- > drivers/s390/kvm/virtio_ccw.c | 58 ++++++++++++++++++++++++++++++++++++------- > include/linux/virtio.h | 2 ++ > 3 files changed, 64 insertions(+), 10 deletions(-) > > -- > 1.8.3.1
Heinz Graalfs
2014-Jan-28  16:12 UTC
[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
On 23/01/14 05:51, Rusty Russell wrote:> Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes: >> Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th. > > Hi Heinz, > > I didn't get a response on my 'break all the virtqueues' patch > series. Could your System Z code work with this? > > Rusty. > >Sorry Rusty, I'm back as of today. I applied your patch series and did some testing... Removing a disk while reading from it mostly still ends up in hangs as of below: PID: 13 TASK: 163f8000 CPU: 0 COMMAND: "kworker/u128:1" #0 [163f72e0] __schedule at 6aa22c #1 [163f7428] io_schedule at 6aab6c #2 [163f7448] sleep_on_page at 22cbb2 #3 [163f7460] __wait_on_bit at 6ab394 #4 [163f74b0] wait_on_page_bit at 22cef4 #5 [163f7508] filemap_fdatawait_range at 22d0a6 #6 [163f75e8] filemap_write_and_wait at 22de62 #7 [163f7618] fsync_bdev at 2dc5d8 #8 [163f7640] invalidate_partition at 407ba8 #9 [163f7668] del_gendisk at 408a4c #10 [163f76c0] virtblk_remove at 46f81e #11 [163f76f8] virtio_dev_remove at 43d302 #12 [163f7730] __device_release_driver at 4604c4 #13 [163f7758] device_release_driver at 46057c #14 [163f7780] bus_remove_device at 45ff74 #15 [163f77b0] device_del at 45cf54 #16 [163f77e8] device_unregister at 45d00e #17 [163f7808] unregister_virtio_device at 43d5ba #18 [163f7828] virtio_ccw_remove at 55156c #19 [163f7850] ccw_device_remove at 4d7e22 #20 [163f78d8] __device_release_driver at 4604c4 #21 [163f7900] device_release_driver at 46057c #22 [163f7928] bus_remove_device at 45ff74 #23 [163f7958] device_del at 45cf54 #24 [163f7990] ccw_device_unregister at 4d86a0 #25 [163f79b0] io_subchannel_remove at 4d8d1a #26 [163f79e8] css_remove at 4d2856 #27 [163f7a08] __device_release_driver at 4604c4 #28 [163f7a30] device_release_driver at 46057c #29 [163f7a58] bus_remove_device at 45ff74 #30 [163f7a88] device_del at 45cf54 #31 [163f7ac0] device_unregister at 45d00e #32 [163f7ae0] css_sch_device_unregister at 4d29d4 #33 [163f7b08] io_subchannel_sch_event at 4daad6 #34 [163f7b80] css_evaluate_known_subchannel at 4d2cc0 #35 [163f7be0] slow_eval_known_fn at 4d3cea #36 [163f7c10] bus_for_each_dev at 45ea56 #37 [163f7c50] for_each_subchannel_staged at 4d337e #38 [163f7c98] css_slow_path_func at 4d3450 #39 [163f7cc0] process_one_work at 164ff4 #40 [163f7d60] worker_thread at 166500 #41 [163f7da8] kthread at 16e67c #42 [163f7eb0] kernel_thread_starter at 6b0a5e Removing a disk while writing to it now ends up mostly with errors (which is new behavior and good). However, the detached device is still listed under /dev, and a subsequent umount ends up in a hang. Latter also occurred with my approach, sometimes. Sometimes everything ends up in QEMU crashes, which is, however, not reproducible. I will investigate on this. Heinz>> When an active virtio block device is hot-unplugged from a KVM guest, >> affected guest user applications are not aware of any errors that occur >> due to the lost device. This patch-set adds code to avoid further request >> queueing when a lost block device is detected, resulting in appropriate >> error info. Additionally a potential hang situation can be avoided by not >> waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that >> will never complete. >> >> On System z there exists no handshake mechanism between host and guest >> when a device is hot-unplugged. The device is removed and no further I/O >> is possible. >> >> When an online channel device disappears on System z the kernel's CIO layer >> informs the driver (virtio_ccw) about the lost device. >> >> Here are some more error details: >> >> For a particular block device virtio's request function virtblk_request() >> is called by the block layer to queue requests to be handled by the host. >> In case of a lost device requests can still be queued, but an appropriate >> subsequent host kick usually fails. This leads to situations where no error >> feedback is shown. >> >> In order to prevent request queueing for lost devices appropriate settings >> in the block layer should be made. Exploiting System z's CIO notify handler >> callback, and passing on device loss information via the surprize_removal >> flag to the remove callback of the backend driver, can solve this task. >> >> v3->v4 changes: >> - patch 1: solves some vcdev pointer handling issues in the virtio_ccw driver >> (e.g. locked vcdev pointer reset/query; serialize remove()/set_offline() >> callback processing). >> - patch 2: introduces 'device_lost' atomic in virtio_device and use in >> backend driver virtio_blk accordingly (original 3 patches merged). >> - patch 3: the notify() callback is now serialized with remove()/set_offline() >> callbacks. The notification is ignored if the vcdev pointer has been cleared >> already (by remove() or set_offline()). >> >> v2->v3 changes: >> - remove virtio_driver's notify callback (and appropriate code) introduced >> in my v1 RFC >> - introduce 'surprize_removal' in struct virtio_device >> - change virtio_blk's remove callback to perform special actions when the >> surprize_removal flag is set >> - avoid final I/O by preventing further request queueing >> - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests >> - set surprize_removal in virtio_ccw's notify callback when a device is lost >> >> v1->v2 changes: >> - add include of linux/notifier.h (I also added it to the 3rd patch) >> - get queue lock in order to be able to use safe queue_flag_set() functions >> in virtblk_notify() handler >> >> >> Heinz Graalfs (3): >> virtio_ccw: fix vcdev pointer handling issues >> virtio: introduce 'device_lost' flag in virtio_device >> virtio_ccw: set 'device_lost' on CIO_GONE notification >> >> drivers/block/virtio_blk.c | 14 ++++++++++- >> drivers/s390/kvm/virtio_ccw.c | 58 ++++++++++++++++++++++++++++++++++++------- >> include/linux/virtio.h | 2 ++ >> 3 files changed, 64 insertions(+), 10 deletions(-) >> >> -- >> 1.8.3.1 >
Apparently Analagous Threads
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device
- [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device