On 17.01.22 09:40, Michael S. Tsirkin wrote:> On Mon, Jan 17, 2022 at 09:31:56AM +0100, David Hildenbrand wrote:
>> On 17.01.22 08:55, Michael S. Tsirkin wrote:
>>> On Mon, Jan 17, 2022 at 02:40:11PM +0800, Jason Wang wrote:
>>>>
>>>> ? 2022/1/15 ??5:43, Michael S. Tsirkin ??:
>>>>> A common pattern for device reset is currently:
>>>>> vdev->config->reset(vdev);
>>>>> .. cleanup ..
>>>>>
>>>>> reset prevents new interrupts from arriving and waits for
interrupt
>>>>> handlers to finish.
>>>>>
>>>>> However if - as is common - the handler queues a work
request which is
>>>>> flushed during the cleanup stage, we have code adding
buffers / trying
>>>>> to get buffers while device is reset. Not good.
>>>>>
>>>>> This was reproduced by running
>>>>> modprobe virtio_console
>>>>> modprobe -r virtio_console
>>>>> in a loop, and this reasoning seems to apply to virtio mem
though
>>>>> I could not reproduce it there.
>>>>>
>>>>> Fix this up by calling virtio_break_device + flush before
reset.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
>>>>> ---
>>>>> drivers/virtio/virtio_mem.c | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_mem.c
b/drivers/virtio/virtio_mem.c
>>>>> index 38becd8d578c..33b8a118a3ae 100644
>>>>> --- a/drivers/virtio/virtio_mem.c
>>>>> +++ b/drivers/virtio/virtio_mem.c
>>>>> @@ -2888,6 +2888,8 @@ static void virtio_mem_remove(struct
virtio_device *vdev)
>>>>> virtio_mem_deinit_hotplug(vm);
>>>>> /* reset the device and cleanup the queues */
>>>>> + virtio_break_device(vdev);
>>>>> + flush_work(&vm->wq);
>>>>
>>>>
>>>> We set vm->removing to true and call cancel_work_sync() in
>>>> virtio_mem_deinit_hotplug(). Isn't is sufficient?
>>>>
>>>> Thanks
>>>
>>>
>>> Hmm I think you are right. David, I will drop this for now.
>>> Up to you to consider whether some central capability will be
>>> helpful as a replacement for the virtio-mem specific
"removing" flag.
>>
>> It's all a bit tricky because we also have to handle pending timers
and
>> pending memory onlining/offlining operations in a controlled way. Maybe
>> we could convert to virtio_break_device() and use the
>> &dev->vqs_list_lock as a replacement for the removal_lock.
However, I'm
>> not 100% sure if it's nice to use that lock from
>> drivers/virtio/virtio_mem.c directly.
>
> We could add an API if you like. Or maybe it makes sense to add a
> separate one that lets you find out that removal started. Need to figure
> out how to handle suspend too then ...
> Generally we have these checks that device is not going away
> sprinkled over all drivers and I don't like it, but
> it's not easy to build a sane API to handle it, especially
> for high speed things when we can't take locks because performance.
The interesting case might be how to handle virtio_mem_retry(), whereby
we conditionally queue work if !removing.
Having that said, in an ideal world we could deny removal requests for
virtio_mem completely if there is still any memory added to the system ...
--
Thanks,
David / dhildenb