Heinz Graalfs
2013-Nov-21 14:43 UTC
[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
On 21/11/13 07:47, Michael S. Tsirkin wrote:> On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote: >> Hi, >> >> when an active virtio block device is hot-unplugged from a KVM guest, running >> affected guest user applications are not aware of any errors that occur due >> to the lost device. This patch-set adds code to avoid further request queueing >> when a lost block device is detected, resulting in appropriate error info. >> >> On System z there exists no handshake mechanism between host and guest >> when a device is hot-unplugged. The device is removed and no further I/O >> is possible. >> >> When an online channel device disappears on System z the kernel's CIO layer >> informs the driver (virtio_ccw) about the lost device. >> >> It's just the block device drivers that care to provide a notify >> callback. >> >> Here are some more error details: >> >> For a particular block device virtio's request function virtblk_request() >> is called by the block layer to queue requests to be handled by the host. >> In case of a lost device requests can still be queued, but an appropriate >> subsequent host kick usually fails. This leads to situations where no error >> feedback is shown. >> >> In order to prevent request queueing for lost devices appropriate settings >> in the block layer should be made. Exploiting System z's CIO notify handler >> callback, and adding a corresponding new virtio_driver notify() handler to >> 'inform' the block layer, solve this task. >> >> Patch 1 adds an optional notify() callback to virtio_driver. >> >> Patch 2 adds a new notify() callback for the virtio_blk driver. When called >> for a lost device settings are made to prevent future request queueing. >> >> Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass >> on the lost device info to virtio's backend driver virtio_blk. > > Question: I guess remove callback is invoked eventually? > Could you please clarify why isn't this sufficient? >yes, the remove callback is invoked lateron, and it could be done there. However, it should be done conditionally, and prior to invoking del_gendisk() (which triggers final I/O). We would still have the need for such notification information. The remove callback is also invoked when a device is set offline, and in that case we don't want a queue to reject further requests. The way it is done right doesn't affect the remove callback. The weird situation, however, is solved by the new notify callback. Doing it in blk_cleanup_queue() (also triggered from virtblk_remove()) is too late for this scenario of a lost device. One wouldn't see any errors, but experience a 'hang' due to inclomplete I/O. The invocation of virtblk_request() indirectly caused by del_gendisk() would accept requests, the subsequent host notification, however, would fail. (This is probably another 'window' that should be closed.)> > >> Heinz Graalfs (3): >> virtio: add notify() callback to virtio_driver >> virtio_blk: add virtblk_notify() as virtio_driver's notify() callback >> virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification >> >> drivers/block/virtio_blk.c | 14 ++++++++++++++ >> drivers/s390/kvm/virtio_ccw.c | 14 ++++++++++++-- >> drivers/virtio/virtio.c | 8 ++++++++ >> include/linux/virtio.h | 10 ++++++++++ >> 4 files changed, 44 insertions(+), 2 deletions(-) >> >> -- >> 1.8.3.1 >
Michael S. Tsirkin
2013-Nov-21 14:58 UTC
[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
On Thu, Nov 21, 2013 at 03:43:51PM +0100, Heinz Graalfs wrote:> On 21/11/13 07:47, Michael S. Tsirkin wrote: > >On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote: > >>Hi, > >> > >>when an active virtio block device is hot-unplugged from a KVM guest, running > >>affected guest user applications are not aware of any errors that occur due > >>to the lost device. This patch-set adds code to avoid further request queueing > >>when a lost block device is detected, resulting in appropriate error info. > >> > >>On System z there exists no handshake mechanism between host and guest > >>when a device is hot-unplugged. The device is removed and no further I/O > >>is possible. > >> > >>When an online channel device disappears on System z the kernel's CIO layer > >>informs the driver (virtio_ccw) about the lost device. > >> > >>It's just the block device drivers that care to provide a notify > >>callback. > >> > >>Here are some more error details: > >> > >>For a particular block device virtio's request function virtblk_request() > >>is called by the block layer to queue requests to be handled by the host. > >>In case of a lost device requests can still be queued, but an appropriate > >>subsequent host kick usually fails. This leads to situations where no error > >>feedback is shown. > >> > >>In order to prevent request queueing for lost devices appropriate settings > >>in the block layer should be made. Exploiting System z's CIO notify handler > >>callback, and adding a corresponding new virtio_driver notify() handler to > >>'inform' the block layer, solve this task. > >> > >>Patch 1 adds an optional notify() callback to virtio_driver. > >> > >>Patch 2 adds a new notify() callback for the virtio_blk driver. When called > >>for a lost device settings are made to prevent future request queueing. > >> > >>Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass > >>on the lost device info to virtio's backend driver virtio_blk. > > > >Question: I guess remove callback is invoked eventually? > >Could you please clarify why isn't this sufficient? > > > > yes, the remove callback is invoked lateron, and it could be done > there. However, it should be done conditionally, and prior to > invoking del_gendisk() (which triggers final I/O). We would still > have the need for such notification information. The remove callback > is also invoked when a device is set offline, and in that case we > don't want a queue to reject further requests. The way it is done > right doesn't affect the remove callback. The weird situation, > however, is solved by the new notify callback. > Doing it in blk_cleanup_queue() (also triggered from > virtblk_remove()) is too late for this scenario of a lost device. > One wouldn't see any errors, but experience a 'hang' due to > inclomplete I/O. The invocation of virtblk_request() indirectly > caused by del_gendisk() would accept requests, the subsequent host > notification, however, would fail. (This is probably another > 'window' that should be closed.)I see. All this makes sense. So it's really important that the event is sent *before* device is removed. Maybe it's a good idea to rename event GONE->GOING_AWAY ?> > > > > > >>Heinz Graalfs (3): > >> virtio: add notify() callback to virtio_driver > >> virtio_blk: add virtblk_notify() as virtio_driver's notify() callback > >> virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification > >> > >> drivers/block/virtio_blk.c | 14 ++++++++++++++ > >> drivers/s390/kvm/virtio_ccw.c | 14 ++++++++++++-- > >> drivers/virtio/virtio.c | 8 ++++++++ > >> include/linux/virtio.h | 10 ++++++++++ > >> 4 files changed, 44 insertions(+), 2 deletions(-) > >> > >>-- > >>1.8.3.1 > >
Heinz Graalfs
2013-Nov-21 17:18 UTC
[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
On 21/11/13 15:58, Michael S. Tsirkin wrote:> On Thu, Nov 21, 2013 at 03:43:51PM +0100, Heinz Graalfs wrote: >> On 21/11/13 07:47, Michael S. Tsirkin wrote: >>> On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote: >>>> Hi, >>>> >>>> when an active virtio block device is hot-unplugged from a KVM guest, running >>>> affected guest user applications are not aware of any errors that occur due >>>> to the lost device. This patch-set adds code to avoid further request queueing >>>> when a lost block device is detected, resulting in appropriate error info. >>>> >>>> On System z there exists no handshake mechanism between host and guest >>>> when a device is hot-unplugged. The device is removed and no further I/O >>>> is possible. >>>> >>>> When an online channel device disappears on System z the kernel's CIO layer >>>> informs the driver (virtio_ccw) about the lost device. >>>> >>>> It's just the block device drivers that care to provide a notify >>>> callback. >>>> >>>> Here are some more error details: >>>> >>>> For a particular block device virtio's request function virtblk_request() >>>> is called by the block layer to queue requests to be handled by the host. >>>> In case of a lost device requests can still be queued, but an appropriate >>>> subsequent host kick usually fails. This leads to situations where no error >>>> feedback is shown. >>>> >>>> In order to prevent request queueing for lost devices appropriate settings >>>> in the block layer should be made. Exploiting System z's CIO notify handler >>>> callback, and adding a corresponding new virtio_driver notify() handler to >>>> 'inform' the block layer, solve this task. >>>> >>>> Patch 1 adds an optional notify() callback to virtio_driver. >>>> >>>> Patch 2 adds a new notify() callback for the virtio_blk driver. When called >>>> for a lost device settings are made to prevent future request queueing. >>>> >>>> Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass >>>> on the lost device info to virtio's backend driver virtio_blk. >>> >>> Question: I guess remove callback is invoked eventually? >>> Could you please clarify why isn't this sufficient? >>> >> >> yes, the remove callback is invoked lateron, and it could be done >> there. However, it should be done conditionally, and prior to >> invoking del_gendisk() (which triggers final I/O). We would still >> have the need for such notification information. The remove callback >> is also invoked when a device is set offline, and in that case we >> don't want a queue to reject further requests. The way it is done >> right doesn't affect the remove callback. The weird situation, >> however, is solved by the new notify callback. >> Doing it in blk_cleanup_queue() (also triggered from >> virtblk_remove()) is too late for this scenario of a lost device. >> One wouldn't see any errors, but experience a 'hang' due to >> inclomplete I/O. The invocation of virtblk_request() indirectly >> caused by del_gendisk() would accept requests, the subsequent host >> notification, however, would fail. (This is probably another >> 'window' that should be closed.) > > I see. All this makes sense. > > So it's really important that the event is sent > *before* device is removed.well, if this event comes in all device related I/O will fail, so we better don't trigger any further I/O.> > Maybe it's a good idea to rename event GONE->GOING_AWAY ?if this event comes in the device is GONE, it's not like 'going away'> >> >>> >>> >>>> Heinz Graalfs (3): >>>> virtio: add notify() callback to virtio_driver >>>> virtio_blk: add virtblk_notify() as virtio_driver's notify() callback >>>> virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification >>>> >>>> drivers/block/virtio_blk.c | 14 ++++++++++++++ >>>> drivers/s390/kvm/virtio_ccw.c | 14 ++++++++++++-- >>>> drivers/virtio/virtio.c | 8 ++++++++ >>>> include/linux/virtio.h | 10 ++++++++++ >>>> 4 files changed, 44 insertions(+), 2 deletions(-) >>>> >>>> -- >>>> 1.8.3.1 >>> >
Possibly Parallel Threads
- [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
- [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
- [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
- [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
- [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver