thr3ads.net - Virtualization - [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio

If this information is useful, please help other people find it:
Share via:

Heinz Graalfs

2013-Dec-17 14:01 UTC

[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

On 17/12/13 04:42, Rusty Russell wrote:> Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:
>> Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov 27th.
>>
>> When an active virtio block device is hot-unplugged from a KVM guest,
>> affected guest user applications are not aware of any errors that occur
>> due to the lost device. This patch-set adds code to avoid further
request
>> queueing when a lost block device is detected, resulting in appropriate
>> error info. Additionally a potential hang situation can be avoided by
not
>> waiting for requests (e.g. in-flight requests) in blk_cleanup_queue()
that
>> will never complete.
>>
>> On System z there exists no handshake mechanism between host and guest
>> when a device is hot-unplugged. The device is removed and no further
I/O
>> is possible.
>
>
> Hi Heinz,
>
>          If you simply mark every virtqueue as broken when this
> unexpected unplug happens, does that not Just Work?
>
> I think I've asked this before...
> Rusty.
Hi Rusty,

setting the (one) virtqueue, vblk is currently using, as broken doesn't 
solve the problems.

In that case virtblk_request()s still succeed  - like this one...

([<0000000000112b28>] show_trace+0xf8/0x154)
  [<0000000000112bde>] show_stack+0x5a/0xdc
  [<000000000045eb56>] virtblk_request+0x25a/0x2b8
  [<00000000003e749c>] __blk_run_queue+0x50/0x64
  [<00000000003edb54>] blk_queue_bio+0x358/0x3f0
  [<00000000003eb446>] generic_make_request+0xea/0x130
  [<00000000003eb536>] submit_bio+0xaa/0x1a8
  [<00000000002c95e8>] _submit_bh+0x1c4/0x2f4
  [<00000000003a25e4>] journal_write_superblock+0xa0/0x1fc
  [<00000000003a3ed4>] journal_update_sb_log_tail+0x48/0x7c
  [<000000000039e742>] journal_commit_transaction+0x1586/0x1aa0
  [<00000000003a2a0e>] kjournald+0xfe/0x2a0
  [<00000000001786fc>] kthread+0xd8/0xe0
  [<0000000000698fee>] kernel_thread_starter+0x6/0xc
2 locks held by kjournald/1984:

... and end up in hang situations ...

PID: 13     TASK: 1e3f8000          CPU: 0   COMMAND: "kworker/u128:1"
  #0 [1e2033e0] __schedule at 695ff2
  #1 [1e203530] log_wait_commit at 3a28a6
  #2 [1e2035a0] ext3_sync_fs at 328dea
  #3 [1e2035d8] sync_filesystem at 2c785c
  #4 [1e203600] fsync_bdev at 2d4650
  #5 [1e203628] invalidate_partition at 3f80c8
  #6 [1e203650] del_gendisk at 3f8f5c
  #7 [1e2036c8] virtblk_remove at 45e60e
  #8 [1e203700] virtio_dev_remove at 42d72e
  #9 [1e203738] __device_release_driver at 44f0b0
#10 [1e203760] device_release_driver at 44f16c
#11 [1e203788] bus_remove_device at 44ea92
#12 [1e2037b8] device_del at 44bb40
#13 [1e2037f0] device_unregister at 44bbfa
#14 [1e203810] unregister_virtio_device at 42d9e6
#15 [1e203830] virtio_ccw_remove at 53b834
#16 [1e203850] ccw_device_remove at 4c5bf6
#17 [1e2038d8] __device_release_driver at 44f0b0
#18 [1e203900] device_release_driver at 44f16c
#19 [1e203928] bus_remove_device at 44ea92
#20 [1e203958] device_del at 44bb40
#21 [1e203990] ccw_device_unregister at 4c645c
#22 [1e2039b0] io_subchannel_remove at 4c6b1a
#23 [1e2039e8] css_remove at 4c054e
#24 [1e203a08] __device_release_driver at 44f0b0
#25 [1e203a30] device_release_driver at 44f16c
#26 [1e203a58] bus_remove_device at 44ea92
#27 [1e203a88] device_del at 44bb40
#28 [1e203ac0] device_unregister at 44bbfa
#29 [1e203ae0] css_sch_device_unregister at 4c06cc
#30 [1e203b08] io_subchannel_sch_event at 4c8c3a
#31 [1e203b80] css_evaluate_known_subchannel at 4c09bc
#32 [1e203be0] slow_eval_known_fn at 4c19a6
#33 [1e203c10] bus_for_each_dev at 44d50e
#34 [1e203c50] for_each_subchannel_staged at 4c1066
#35 [1e203c98] css_slow_path_func at 4c1124
#36 [1e203cc0] process_one_work at 16c7f6
#37 [1e203d60] worker_thread at 16dce4
#38 [1e203da8] kthread at 1786fc
#39 [1e203eb0] kernel_thread_starter at 698fee

Heinz
>
>>
>> When an online channel device disappears on System z the kernel's
CIO layer
>> informs the driver (virtio_ccw) about the lost device.
>>
>> Here are some more error details:
>>
>> For a particular block device virtio's request function
virtblk_request()
>> is called by the block layer to queue requests to be handled by the
host.
>> In case of a lost device requests can still be queued, but an
appropriate
>> subsequent host kick usually fails. This leads to situations where no
error
>> feedback is shown.
>>
>> In order to prevent request queueing for lost devices appropriate
settings
>> in the block layer should be made. Exploiting System z's CIO notify
handler
>> callback, and passing on device loss information via the
surprize_removal
>> flag to the remove callback of the backend driver, can solve this task.
>>
>> v3->v4 changes:
>>   - patch 1: solves some vcdev pointer handling issues in the
virtio_ccw driver
>>     (e.g. locked vcdev pointer reset/query; serialize
remove()/set_offline()
>>     callback processing).
>>   - patch 2: introduces 'device_lost' atomic in virtio_device
and use in
>>     backend driver virtio_blk accordingly (original 3 patches merged).
>>   - patch 3: the notify() callback is now serialized with
remove()/set_offline()
>>     callbacks. The notification is ignored if the vcdev pointer has
been cleared
>>     already (by remove() or set_offline()).
>>
>> v2->v3 changes:
>>   - remove virtio_driver's notify callback (and appropriate code)
introduced
>>     in my v1 RFC
>>   - introduce 'surprize_removal' in struct virtio_device
>>   - change virtio_blk's remove callback to perform special actions
when the
>>     surprize_removal flag is set
>>     - avoid final I/O by preventing further request queueing
>>     - avoid hangs in blk_cleanup_queue() due to waits on
'in-flight' requests
>>   - set surprize_removal in virtio_ccw's notify callback when a
device is lost
>>
>> v1->v2 changes:
>>   - add include of linux/notifier.h (I also added it to the 3rd patch)
>>   - get queue lock in order to be able to use safe queue_flag_set()
functions
>>     in virtblk_notify() handler
>>
>>
>> Heinz Graalfs (3):
>>    virtio_ccw: fix vcdev pointer handling issues
>>    virtio: introduce 'device_lost' flag in virtio_device
>>    virtio_ccw: set 'device_lost' on CIO_GONE notification
>>
>>   drivers/block/virtio_blk.c    | 14 ++++++++++-
>>   drivers/s390/kvm/virtio_ccw.c | 58
++++++++++++++++++++++++++++++++++++-------
>>   include/linux/virtio.h        |  2 ++
>>   3 files changed, 64 insertions(+), 10 deletions(-)
>>
>> --
>> 1.8.3.1
>

Rusty Russell

2013-Dec-19 00:19 UTC

head link

[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

Heinz Graalfs <graalfs at linux.vnet.ibm.com>
writes:> On 17/12/13 04:42, Rusty Russell wrote:
>> Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:
>>> Hi, here is my v4 patch-set update to the v3 RFC submitted on Nov
27th.
>>>
>>> When an active virtio block device is hot-unplugged from a KVM
guest,
>>> affected guest user applications are not aware of any errors that
occur
>>> due to the lost device. This patch-set adds code to avoid further
request
>>> queueing when a lost block device is detected, resulting in
appropriate
>>> error info. Additionally a potential hang situation can be avoided
by not
>>> waiting for requests (e.g. in-flight requests) in
blk_cleanup_queue() that
>>> will never complete.
>>>
>>> On System z there exists no handshake mechanism between host and
guest
>>> when a device is hot-unplugged. The device is removed and no
further I/O
>>> is possible.
>>
>>
>> Hi Heinz,
>>
>>          If you simply mark every virtqueue as broken when this
>> unexpected unplug happens, does that not Just Work?
>>
>> I think I've asked this before...
>> Rusty.
>
> Hi Rusty,
>
> setting the (one) virtqueue, vblk is currently using, as broken doesn't
> solve the problems.
>
> In that case virtblk_request()s still succeed  - like this one...
No, you set *all* virtqueues broken.  Which is accurate, right?

Cheers,
Rusty.
>
> ([<0000000000112b28>] show_trace+0xf8/0x154)
>   [<0000000000112bde>] show_stack+0x5a/0xdc
>   [<000000000045eb56>] virtblk_request+0x25a/0x2b8
>   [<00000000003e749c>] __blk_run_queue+0x50/0x64
>   [<00000000003edb54>] blk_queue_bio+0x358/0x3f0
>   [<00000000003eb446>] generic_make_request+0xea/0x130
>   [<00000000003eb536>] submit_bio+0xaa/0x1a8
>   [<00000000002c95e8>] _submit_bh+0x1c4/0x2f4
>   [<00000000003a25e4>] journal_write_superblock+0xa0/0x1fc
>   [<00000000003a3ed4>] journal_update_sb_log_tail+0x48/0x7c
>   [<000000000039e742>] journal_commit_transaction+0x1586/0x1aa0
>   [<00000000003a2a0e>] kjournald+0xfe/0x2a0
>   [<00000000001786fc>] kthread+0xd8/0xe0
>   [<0000000000698fee>] kernel_thread_starter+0x6/0xc
> 2 locks held by kjournald/1984:
>
> ... and end up in hang situations ...
>
> PID: 13     TASK: 1e3f8000          CPU: 0   COMMAND:
"kworker/u128:1"
>   #0 [1e2033e0] __schedule at 695ff2
>   #1 [1e203530] log_wait_commit at 3a28a6
>   #2 [1e2035a0] ext3_sync_fs at 328dea
>   #3 [1e2035d8] sync_filesystem at 2c785c
>   #4 [1e203600] fsync_bdev at 2d4650
>   #5 [1e203628] invalidate_partition at 3f80c8
>   #6 [1e203650] del_gendisk at 3f8f5c
>   #7 [1e2036c8] virtblk_remove at 45e60e
>   #8 [1e203700] virtio_dev_remove at 42d72e
>   #9 [1e203738] __device_release_driver at 44f0b0
> #10 [1e203760] device_release_driver at 44f16c
> #11 [1e203788] bus_remove_device at 44ea92
> #12 [1e2037b8] device_del at 44bb40
> #13 [1e2037f0] device_unregister at 44bbfa
> #14 [1e203810] unregister_virtio_device at 42d9e6
> #15 [1e203830] virtio_ccw_remove at 53b834
> #16 [1e203850] ccw_device_remove at 4c5bf6
> #17 [1e2038d8] __device_release_driver at 44f0b0
> #18 [1e203900] device_release_driver at 44f16c
> #19 [1e203928] bus_remove_device at 44ea92
> #20 [1e203958] device_del at 44bb40
> #21 [1e203990] ccw_device_unregister at 4c645c
> #22 [1e2039b0] io_subchannel_remove at 4c6b1a
> #23 [1e2039e8] css_remove at 4c054e
> #24 [1e203a08] __device_release_driver at 44f0b0
> #25 [1e203a30] device_release_driver at 44f16c
> #26 [1e203a58] bus_remove_device at 44ea92
> #27 [1e203a88] device_del at 44bb40
> #28 [1e203ac0] device_unregister at 44bbfa
> #29 [1e203ae0] css_sch_device_unregister at 4c06cc
> #30 [1e203b08] io_subchannel_sch_event at 4c8c3a
> #31 [1e203b80] css_evaluate_known_subchannel at 4c09bc
> #32 [1e203be0] slow_eval_known_fn at 4c19a6
> #33 [1e203c10] bus_for_each_dev at 44d50e
> #34 [1e203c50] for_each_subchannel_staged at 4c1066
> #35 [1e203c98] css_slow_path_func at 4c1124
> #36 [1e203cc0] process_one_work at 16c7f6
> #37 [1e203d60] worker_thread at 16dce4
> #38 [1e203da8] kthread at 1786fc
> #39 [1e203eb0] kernel_thread_starter at 698fee
>
> Heinz
>
>>
>>>
>>> When an online channel device disappears on System z the
kernel's CIO layer
>>> informs the driver (virtio_ccw) about the lost device.
>>>
>>> Here are some more error details:
>>>
>>> For a particular block device virtio's request function
virtblk_request()
>>> is called by the block layer to queue requests to be handled by the
host.
>>> In case of a lost device requests can still be queued, but an
appropriate
>>> subsequent host kick usually fails. This leads to situations where
no error
>>> feedback is shown.
>>>
>>> In order to prevent request queueing for lost devices appropriate
settings
>>> in the block layer should be made. Exploiting System z's CIO
notify handler
>>> callback, and passing on device loss information via the
surprize_removal
>>> flag to the remove callback of the backend driver, can solve this
task.
>>>
>>> v3->v4 changes:
>>>   - patch 1: solves some vcdev pointer handling issues in the
virtio_ccw driver
>>>     (e.g. locked vcdev pointer reset/query; serialize
remove()/set_offline()
>>>     callback processing).
>>>   - patch 2: introduces 'device_lost' atomic in
virtio_device and use in
>>>     backend driver virtio_blk accordingly (original 3 patches
merged).
>>>   - patch 3: the notify() callback is now serialized with
remove()/set_offline()
>>>     callbacks. The notification is ignored if the vcdev pointer has
been cleared
>>>     already (by remove() or set_offline()).
>>>
>>> v2->v3 changes:
>>>   - remove virtio_driver's notify callback (and appropriate
code) introduced
>>>     in my v1 RFC
>>>   - introduce 'surprize_removal' in struct virtio_device
>>>   - change virtio_blk's remove callback to perform special
actions when the
>>>     surprize_removal flag is set
>>>     - avoid final I/O by preventing further request queueing
>>>     - avoid hangs in blk_cleanup_queue() due to waits on
'in-flight' requests
>>>   - set surprize_removal in virtio_ccw's notify callback when a
device is lost
>>>
>>> v1->v2 changes:
>>>   - add include of linux/notifier.h (I also added it to the 3rd
patch)
>>>   - get queue lock in order to be able to use safe queue_flag_set()
functions
>>>     in virtblk_notify() handler
>>>
>>>
>>> Heinz Graalfs (3):
>>>    virtio_ccw: fix vcdev pointer handling issues
>>>    virtio: introduce 'device_lost' flag in virtio_device
>>>    virtio_ccw: set 'device_lost' on CIO_GONE notification
>>>
>>>   drivers/block/virtio_blk.c    | 14 ++++++++++-
>>>   drivers/s390/kvm/virtio_ccw.c | 58
++++++++++++++++++++++++++++++++++++-------
>>>   include/linux/virtio.h        |  2 ++
>>>   3 files changed, 64 insertions(+), 10 deletions(-)
>>>
>>> --
>>> 1.8.3.1
>>

Heinz Graalfs

2013-Dec-23 08:39 UTC

head link

[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

On 19/12/13 01:19, Rusty Russell wrote:> Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:
>> On 17/12/13 04:42, Rusty Russell wrote:
>>> Heinz Graalfs <graalfs at linux.vnet.ibm.com> writes:
>>>> Hi, here is my v4 patch-set update to the v3 RFC submitted on
Nov 27th.
>>>>
>>>> When an active virtio block device is hot-unplugged from a KVM
guest,
>>>> affected guest user applications are not aware of any errors
that occur
>>>> due to the lost device. This patch-set adds code to avoid
further request
>>>> queueing when a lost block device is detected, resulting in
appropriate
>>>> error info. Additionally a potential hang situation can be
avoided by not
>>>> waiting for requests (e.g. in-flight requests) in
blk_cleanup_queue() that
>>>> will never complete.
>>>>
>>>> On System z there exists no handshake mechanism between host
and guest
>>>> when a device is hot-unplugged. The device is removed and no
further I/O
>>>> is possible.
>>>
>>>
>>> Hi Heinz,
>>>
>>>           If you simply mark every virtqueue as broken when this
>>> unexpected unplug happens, does that not Just Work?
>>>
>>> I think I've asked this before...
>>> Rusty.
>>
>> Hi Rusty,
>>
>> setting the (one) virtqueue, vblk is currently using, as broken
doesn't
>> solve the problems.
>>
>> In that case virtblk_request()s still succeed  - like this one...
>
> No, you set *all* virtqueues broken.  Which is accurate, right?
>
> Cheers,
> Rusty.
>
I'm sorry, but I don't get this.
The vblk involved has only 1 virtqueue.
What do you mean by all "*all* virtqueues ?

Heinz
>>
>> ([<0000000000112b28>] show_trace+0xf8/0x154)
>>    [<0000000000112bde>] show_stack+0x5a/0xdc
>>    [<000000000045eb56>] virtblk_request+0x25a/0x2b8
>>    [<00000000003e749c>] __blk_run_queue+0x50/0x64
>>    [<00000000003edb54>] blk_queue_bio+0x358/0x3f0
>>    [<00000000003eb446>] generic_make_request+0xea/0x130
>>    [<00000000003eb536>] submit_bio+0xaa/0x1a8
>>    [<00000000002c95e8>] _submit_bh+0x1c4/0x2f4
>>    [<00000000003a25e4>] journal_write_superblock+0xa0/0x1fc
>>    [<00000000003a3ed4>] journal_update_sb_log_tail+0x48/0x7c
>>    [<000000000039e742>] journal_commit_transaction+0x1586/0x1aa0
>>    [<00000000003a2a0e>] kjournald+0xfe/0x2a0
>>    [<00000000001786fc>] kthread+0xd8/0xe0
>>    [<0000000000698fee>] kernel_thread_starter+0x6/0xc
>> 2 locks held by kjournald/1984:
>>
>> ... and end up in hang situations ...
>>
>> PID: 13     TASK: 1e3f8000          CPU: 0   COMMAND:
"kworker/u128:1"
>>    #0 [1e2033e0] __schedule at 695ff2
>>    #1 [1e203530] log_wait_commit at 3a28a6
>>    #2 [1e2035a0] ext3_sync_fs at 328dea
>>    #3 [1e2035d8] sync_filesystem at 2c785c
>>    #4 [1e203600] fsync_bdev at 2d4650
>>    #5 [1e203628] invalidate_partition at 3f80c8
>>    #6 [1e203650] del_gendisk at 3f8f5c
>>    #7 [1e2036c8] virtblk_remove at 45e60e
>>    #8 [1e203700] virtio_dev_remove at 42d72e
>>    #9 [1e203738] __device_release_driver at 44f0b0
>> #10 [1e203760] device_release_driver at 44f16c
>> #11 [1e203788] bus_remove_device at 44ea92
>> #12 [1e2037b8] device_del at 44bb40
>> #13 [1e2037f0] device_unregister at 44bbfa
>> #14 [1e203810] unregister_virtio_device at 42d9e6
>> #15 [1e203830] virtio_ccw_remove at 53b834
>> #16 [1e203850] ccw_device_remove at 4c5bf6
>> #17 [1e2038d8] __device_release_driver at 44f0b0
>> #18 [1e203900] device_release_driver at 44f16c
>> #19 [1e203928] bus_remove_device at 44ea92
>> #20 [1e203958] device_del at 44bb40
>> #21 [1e203990] ccw_device_unregister at 4c645c
>> #22 [1e2039b0] io_subchannel_remove at 4c6b1a
>> #23 [1e2039e8] css_remove at 4c054e
>> #24 [1e203a08] __device_release_driver at 44f0b0
>> #25 [1e203a30] device_release_driver at 44f16c
>> #26 [1e203a58] bus_remove_device at 44ea92
>> #27 [1e203a88] device_del at 44bb40
>> #28 [1e203ac0] device_unregister at 44bbfa
>> #29 [1e203ae0] css_sch_device_unregister at 4c06cc
>> #30 [1e203b08] io_subchannel_sch_event at 4c8c3a
>> #31 [1e203b80] css_evaluate_known_subchannel at 4c09bc
>> #32 [1e203be0] slow_eval_known_fn at 4c19a6
>> #33 [1e203c10] bus_for_each_dev at 44d50e
>> #34 [1e203c50] for_each_subchannel_staged at 4c1066
>> #35 [1e203c98] css_slow_path_func at 4c1124
>> #36 [1e203cc0] process_one_work at 16c7f6
>> #37 [1e203d60] worker_thread at 16dce4
>> #38 [1e203da8] kthread at 1786fc
>> #39 [1e203eb0] kernel_thread_starter at 698fee
>>
>> Heinz
>>
>>>
>>>>
>>>> When an online channel device disappears on System z the
kernel's CIO layer
>>>> informs the driver (virtio_ccw) about the lost device.
>>>>
>>>> Here are some more error details:
>>>>
>>>> For a particular block device virtio's request function
virtblk_request()
>>>> is called by the block layer to queue requests to be handled by
the host.
>>>> In case of a lost device requests can still be queued, but an
appropriate
>>>> subsequent host kick usually fails. This leads to situations
where no error
>>>> feedback is shown.
>>>>
>>>> In order to prevent request queueing for lost devices
appropriate settings
>>>> in the block layer should be made. Exploiting System z's
CIO notify handler
>>>> callback, and passing on device loss information via the
surprize_removal
>>>> flag to the remove callback of the backend driver, can solve
this task.
>>>>
>>>> v3->v4 changes:
>>>>    - patch 1: solves some vcdev pointer handling issues in the
virtio_ccw driver
>>>>      (e.g. locked vcdev pointer reset/query; serialize
remove()/set_offline()
>>>>      callback processing).
>>>>    - patch 2: introduces 'device_lost' atomic in
virtio_device and use in
>>>>      backend driver virtio_blk accordingly (original 3 patches
merged).
>>>>    - patch 3: the notify() callback is now serialized with
remove()/set_offline()
>>>>      callbacks. The notification is ignored if the vcdev
pointer has been cleared
>>>>      already (by remove() or set_offline()).
>>>>
>>>> v2->v3 changes:
>>>>    - remove virtio_driver's notify callback (and
appropriate code) introduced
>>>>      in my v1 RFC
>>>>    - introduce 'surprize_removal' in struct
virtio_device
>>>>    - change virtio_blk's remove callback to perform special
actions when the
>>>>      surprize_removal flag is set
>>>>      - avoid final I/O by preventing further request queueing
>>>>      - avoid hangs in blk_cleanup_queue() due to waits on
'in-flight' requests
>>>>    - set surprize_removal in virtio_ccw's notify callback
when a device is lost
>>>>
>>>> v1->v2 changes:
>>>>    - add include of linux/notifier.h (I also added it to the
3rd patch)
>>>>    - get queue lock in order to be able to use safe
queue_flag_set() functions
>>>>      in virtblk_notify() handler
>>>>
>>>>
>>>> Heinz Graalfs (3):
>>>>     virtio_ccw: fix vcdev pointer handling issues
>>>>     virtio: introduce 'device_lost' flag in
virtio_device
>>>>     virtio_ccw: set 'device_lost' on CIO_GONE
notification
>>>>
>>>>    drivers/block/virtio_blk.c    | 14 ++++++++++-
>>>>    drivers/s390/kvm/virtio_ccw.c | 58
++++++++++++++++++++++++++++++++++++-------
>>>>    include/linux/virtio.h        |  2 ++
>>>>    3 files changed, 64 insertions(+), 10 deletions(-)
>>>>
>>>> --
>>>> 1.8.3.1
>>>
>

Seemingly Similar Threads

Search for more maybe matching threads

Virtualization - Dec 2013 - [PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

[PATCH v4 RFC 0/3] virtio: add 'device_lost' to virtio_device

Seemingly Similar Threads