Michael S. Tsirkin
2013-Nov-27 10:47 UTC
[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:> Code is added to avoid calling blk_cleanup_queue() when the surprize_removal > flag is set due to a disappeared device. It avoid hangs due to incomplete > requests (e.g. in-flight requests). Such requests must be considered as lost.Ugh. Can't we complete these immediately using detach_unused_buf? If not why?> If the current remove callback was triggered due to an unregister driver, > and the surprize_removal is not already set (although the actual device > is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered > resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests > that will never complete. This is a weird situation, and most likely not > 'serializable'.Hmm interesting. Implement some timeout and probe device to make sure it's still alive?> Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> > --- > drivers/block/virtio_blk.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 0f64282..8c05001 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev) > } > > del_gendisk(vblk->disk); > - blk_cleanup_queue(vblk->disk->queue); > + if (!vdev->surprize_removal) > + blk_cleanup_queue(vblk->disk->queue); > > /* Stop all the virtqueues. */ > vdev->config->reset(vdev); > -- > 1.8.3.1
Heinz Graalfs
2013-Nov-27 11:37 UTC
[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
On 27/11/13 11:47, Michael S. Tsirkin wrote:> On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote: >> Code is added to avoid calling blk_cleanup_queue() when the surprize_removal >> flag is set due to a disappeared device. It avoid hangs due to incomplete >> requests (e.g. in-flight requests). Such requests must be considered as lost. > > Ugh. Can't we complete these immediately using detach_unused_buf? If not why?OK, I will try> >> If the current remove callback was triggered due to an unregister driver, >> and the surprize_removal is not already set (although the actual device >> is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered >> resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests >> that will never complete. This is a weird situation, and most likely not >> 'serializable'. > > Hmm interesting. Implement some timeout and probe device to make sure > it's still alive?but there is always some race, isn't it?> >> Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> >> --- >> drivers/block/virtio_blk.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c >> index 0f64282..8c05001 100644 >> --- a/drivers/block/virtio_blk.c >> +++ b/drivers/block/virtio_blk.c >> @@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev) >> } >> >> del_gendisk(vblk->disk); >> - blk_cleanup_queue(vblk->disk->queue); >> + if (!vdev->surprize_removal) >> + blk_cleanup_queue(vblk->disk->queue); >> >> /* Stop all the virtqueues. */ >> vdev->config->reset(vdev); >> -- >> 1.8.3.1 >
Michael S. Tsirkin
2013-Nov-27 12:28 UTC
[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:> On 27/11/13 11:47, Michael S. Tsirkin wrote: > >On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote: > >>Code is added to avoid calling blk_cleanup_queue() when the surprize_removal > >>flag is set due to a disappeared device. It avoid hangs due to incomplete > >>requests (e.g. in-flight requests). Such requests must be considered as lost. > > > >Ugh. Can't we complete these immediately using detach_unused_buf? If not why? > > OK, I will try > > > > >>If the current remove callback was triggered due to an unregister driver, > >>and the surprize_removal is not already set (although the actual device > >>is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered > >>resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests > >>that will never complete. This is a weird situation, and most likely not > >>'serializable'. > > > >Hmm interesting. Implement some timeout and probe device to make sure > >it's still alive? > > but there is always some race, isn't it?Then we retry after a second timeout?> > > >>Signed-off-by: Heinz Graalfs <graalfs at linux.vnet.ibm.com> > >>--- > >> drivers/block/virtio_blk.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >>diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > >>index 0f64282..8c05001 100644 > >>--- a/drivers/block/virtio_blk.c > >>+++ b/drivers/block/virtio_blk.c > >>@@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev) > >> } > >> > >> del_gendisk(vblk->disk); > >>- blk_cleanup_queue(vblk->disk->queue); > >>+ if (!vdev->surprize_removal) > >>+ blk_cleanup_queue(vblk->disk->queue); > >> > >> /* Stop all the virtqueues. */ > >> vdev->config->reset(vdev); > >>-- > >>1.8.3.1 > >
Michael S. Tsirkin
2013-Nov-27 12:49 UTC
[PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:> On 27/11/13 11:47, Michael S. Tsirkin wrote: > >On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote: > >>Code is added to avoid calling blk_cleanup_queue() when the surprize_removal > >>flag is set due to a disappeared device. It avoid hangs due to incomplete > >>requests (e.g. in-flight requests). Such requests must be considered as lost. > > > >Ugh. Can't we complete these immediately using detach_unused_buf? If not why? > > OK, I will try > > > > >>If the current remove callback was triggered due to an unregister driver, > >>and the surprize_removal is not already set (although the actual device > >>is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered > >>resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests > >>that will never complete. This is a weird situation, and most likely not > >>'serializable'. > > > >Hmm interesting. Implement some timeout and probe device to make sure > >it's still alive? > > but there is always some race, isn't it?To clarify, why this might not be very elegant, a timer-based solution for surprise removal during driver cleanup might be easier than trying to build robust interfaces to address this esoteric case. But what worries me is that it's not clear to me that ccw won't invoke notify in parallel with remove callback. If this happens there will be use after free. -- MST
Possibly Parallel Threads
- [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
- [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
- [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
- [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device
- [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device