thr3ads.net - Virtualization - [RFC v2 05/13] vhost: Route guest->host notification through shadow virtqueue [Mar 2021]

If this information is useful, please help other people find it:
Share via:
Jason Wang
2021-Mar-18 09:29 UTC
[RFC v2 05/13] vhost: Route guest->host notification through shadow virtqueue

? 2021/3/18 ??5:18, Eugenio Perez Martin ??:> On Thu, Mar 18, 2021 at 4:11 AM Jason Wang <jasowang at redhat.com>
wrote:
>>
>> ? 2021/3/18 ??12:47, Eugenio Perez Martin ??:
>>> On Wed, Mar 17, 2021 at 3:05 AM Jason Wang <jasowang at
redhat.com> wrote:
>>>> ? 2021/3/16 ??6:31, Eugenio Perez Martin ??:
>>>>> On Tue, Mar 16, 2021 at 8:18 AM Jason Wang <jasowang at
redhat.com> wrote:
>>>>>> ? 2021/3/16 ??3:48, Eugenio P?rez ??:
>>>>>>> Shadow virtqueue notifications forwarding is
disabled when vhost_dev
>>>>>>> stops, so code flow follows usual cleanup.
>>>>>>>
>>>>>>> Signed-off-by: Eugenio P?rez <eperezma at
redhat.com>
>>>>>>> ---
>>>>>>>      hw/virtio/vhost-shadow-virtqueue.h |   7 ++
>>>>>>>      include/hw/virtio/vhost.h          |   4 +
>>>>>>>      hw/virtio/vhost-shadow-virtqueue.c | 113
++++++++++++++++++++++-
>>>>>>>      hw/virtio/vhost.c                  | 143
++++++++++++++++++++++++++++-
>>>>>>>      4 files changed, 265 insertions(+), 2
deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h
b/hw/virtio/vhost-shadow-virtqueue.h
>>>>>>> index 6cc18d6acb..c891c6510d 100644
>>>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>>>>> @@ -17,6 +17,13 @@
>>>>>>>
>>>>>>>      typedef struct VhostShadowVirtqueue
VhostShadowVirtqueue;
>>>>>>>
>>>>>>> +bool vhost_shadow_vq_start(struct vhost_dev *dev,
>>>>>>> +                           unsigned idx,
>>>>>>> +                           VhostShadowVirtqueue
*svq);
>>>>>>> +void vhost_shadow_vq_stop(struct vhost_dev *dev,
>>>>>>> +                          unsigned idx,
>>>>>>> +                          VhostShadowVirtqueue
*svq);
>>>>>>> +
>>>>>>>      VhostShadowVirtqueue
*vhost_shadow_vq_new(struct vhost_dev *dev, int idx);
>>>>>>>
>>>>>>>      void vhost_shadow_vq_free(VhostShadowVirtqueue
*vq);
>>>>>>> diff --git a/include/hw/virtio/vhost.h
b/include/hw/virtio/vhost.h
>>>>>>> index ac963bf23d..7ffdf9aea0 100644
>>>>>>> --- a/include/hw/virtio/vhost.h
>>>>>>> +++ b/include/hw/virtio/vhost.h
>>>>>>> @@ -55,6 +55,8 @@ struct vhost_iommu {
>>>>>>>          QLIST_ENTRY(vhost_iommu) iommu_next;
>>>>>>>      };
>>>>>>>
>>>>>>> +typedef struct VhostShadowVirtqueue
VhostShadowVirtqueue;
>>>>>>> +
>>>>>>>      typedef struct VhostDevConfigOps {
>>>>>>>          /* Vhost device config space changed
callback
>>>>>>>           */
>>>>>>> @@ -83,7 +85,9 @@ struct vhost_dev {
>>>>>>>          uint64_t backend_cap;
>>>>>>>          bool started;
>>>>>>>          bool log_enabled;
>>>>>>> +    bool shadow_vqs_enabled;
>>>>>>>          uint64_t log_size;
>>>>>>> +    VhostShadowVirtqueue **shadow_vqs;
>>>>>> Any reason that you don't embed the shadow
virtqueue into
>>>>>> vhost_virtqueue structure?
>>>>>>
>>>>> Not really, it could be relatively big and I would prefer
SVQ
>>>>> members/methods to remain hidden from any other part that
includes
>>>>> vhost.h. But it could be changed, for sure.
>>>>>
>>>>>> (Note that there's a masked_notifier in struct
vhost_virtqueue).
>>>>>>
>>>>> They are used differently: in SVQ the masked notifier is a
pointer,
>>>>> and if it's NULL the SVQ code knows that device is not
masked. The
>>>>> vhost_virtqueue is the real owner.
>>>> Yes, but it's an example for embedding auxciliary data
structures in the
>>>> vhost_virtqueue.
>>>>
>>>>
>>>>> It could be replaced by a boolean in SVQ or something like
that, I
>>>>> experimented with a tri-state too (UNMASKED, MASKED,
MASKED_NOTIFIED)
>>>>> and let vhost.c code to manage all the transitions. But I
find clearer
>>>>> the pointer use, since it's the more natural for the
>>>>> vhost_virtqueue_mask, vhost_virtqueue_pending existing
functions.
>>>>>
>>>>> This masking/unmasking is the part I dislike the most from
this
>>>>> series, so I'm very open to alternatives.
>>>> See below. I think we don't even need to care about that.
>>>>
>>>>
>>>>>>>          Error *migration_blocker;
>>>>>>>          const VhostOps *vhost_ops;
>>>>>>>          void *opaque;
>>>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c
b/hw/virtio/vhost-shadow-virtqueue.c
>>>>>>> index 4512e5b058..3e43399e9c 100644
>>>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>>>>> @@ -8,9 +8,12 @@
>>>>>>>       */
>>>>>>>
>>>>>>>      #include
"hw/virtio/vhost-shadow-virtqueue.h"
>>>>>>> +#include "hw/virtio/vhost.h"
>>>>>>> +
>>>>>>> +#include
"standard-headers/linux/vhost_types.h"
>>>>>>>
>>>>>>>      #include "qemu/error-report.h"
>>>>>>> -#include "qemu/event_notifier.h"
>>>>>>> +#include "qemu/main-loop.h"
>>>>>>>
>>>>>>>      /* Shadow virtqueue to relay notifications */
>>>>>>>      typedef struct VhostShadowVirtqueue {
>>>>>>> @@ -18,14 +21,121 @@ typedef struct
VhostShadowVirtqueue {
>>>>>>>          EventNotifier kick_notifier;
>>>>>>>          /* Shadow call notifier, sent to vhost */
>>>>>>>          EventNotifier call_notifier;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * Borrowed virtqueue's guest to host
notifier.
>>>>>>> +     * To borrow it in this event notifier allows
to register on the event
>>>>>>> +     * loop and access the associated shadow
virtqueue easily. If we use the
>>>>>>> +     * VirtQueue, we don't have an easy way to
retrieve it.
>>>>>> So this is something that worries me. It looks like a
layer violation
>>>>>> that makes the codes harder to work correctly.
>>>>>>
>>>>> I don't follow you here.
>>>>>
>>>>> The vhost code already depends on virtqueue in the same
sense:
>>>>> virtio_queue_get_host_notifier is called on
vhost_virtqueue_start. So
>>>>> if this behavior ever changes it is unlikely for vhost to
keep working
>>>>> without changes. vhost_virtqueue has a kick/call int where
I think it
>>>>> should be stored actually, but they are never used as far
as I see.
>>>>>
>>>>> Previous RFC did rely on vhost_dev_disable_notifiers. From
its documentation:
>>>>> /* Stop processing guest IO notifications in vhost.
>>>>>     * Start processing them in qemu.
>>>>>     ...
>>>>> But it was easier for this mode to miss a notification,
since they
>>>>> create a new host_notifier in virtio_bus_set_host_notifier
right away.
>>>>> So I decided to use the file descriptor already sent to
vhost in
>>>>> regular operation mode, so guest-related resources change
less.
>>>>>
>>>>> Having said that, maybe it's useful to assert that
>>>>> vhost_dev_{enable,disable}_notifiers are never called on
shadow
>>>>> virtqueue mode. Also, it could be useful to retrieve it
from
>>>>> virtio_bus, not raw shadow virtqueue, so all get/set are
performed
>>>>> from it. Would that make more sense?
>>>>>
>>>>>> I wonder if it would be simpler to start from a vDPA
dedicated shadow
>>>>>> virtqueue implementation:
>>>>>>
>>>>>> 1) have the above fields embeded in vhost_vdpa
structure
>>>>>> 2) Work at the level of
>>>>>> vhost_vdpa_set_vring_kick()/vhost_vdpa_set_vring_call()
>>>>>>
>>>>> This notifier is never sent to the device in shadow
virtqueue mode.
>>>>> It's for SVQ to react to guest's notifications,
registering it on its
>>>>> main event loop [1]. So if I perform these changes the way
I
>>>>> understand them, SVQ would still rely on this borrowed
EventNotifier,
>>>>> and it would send to the vDPA device the newly created
kick_notifier
>>>>> of VhostShadowVirtqueue.
>>>> The point is that vhost code should be coupled loosely with
virtio. If
>>>> you try to "borrow" EventNotifier from virtio, you
need to deal with a
>>>> lot of synchrization. An exampleis the masking stuffs.
>>>>
>>> I still don't follow this, sorry.
>>>
>>> The svq->host_notifier event notifier is not affected by the
masking
>>> issue, it is completely private to SVQ. This commit creates and
uses
>>> it, and nothing related to masking is touched until the next
commit.
>>>
>>>>>> Then the layer is still isolated and you have a much
simpler context to
>>>>>> work that you don't need to care a lot of
synchornization:
>>>>>>
>>>>>> 1) vq masking
>>>>> This EventNotifier is not used for masking, it does not
change from
>>>>> the start of the shadow virtqueue operation through its
end. Call fd
>>>>> sent to vhost/vdpa device does not change either in shadow
virtqueue
>>>>> mode operation with masking/unmasking. I will try to
document it
>>>>> better.
>>>>>
>>>>> I think that we will need to handle synchronization with
>>>>> masking/unmasking from the guest and dynamically enabling
SVQ
>>>>> operation mode, since they can happen at the same time as
long as we
>>>>> let the guest run. There may be better ways of
synchronizing them of
>>>>> course, but I don't see how moving to the vhost-vdpa
backend helps
>>>>> with this. Please expand if I've missed it.
>>>>>
>>>>> Or do you mean to forbid regular <-> SVQ operation
mode transitions and delay it
>>>>> to future patchsets?
>>>> So my idea is to do all the shadow virtqueue in the vhost-vDPA
codes and
>>>> hide them from the upper layers like virtio. This means it
works at
>>>> vhost level which can see vhost_vring_file only. When enalbed,
what it
>>>> needs is just:
>>>>
>>>> 1) switch to use svq kickfd and relay ioeventfd to svq kickfd
>>>> 2) switch to use svq callfd and relay svq callfd to irqfd
>>>>
>>>> It will still behave like a vhost-backend that the switching is
done
>>>> internally in vhost-vDPA which is totally transparent to the
virtio
>>>> codes of Qemu.
>>>>
>>>> E.g:
>>>>
>>>> 1) in the case of guest notifier masking, we don't need to
do anything
>>>> since virtio codes will replace another irqfd for us.
>>> Assuming that we don't modify vhost masking code, but send
shadow
>>> virtqueue call descriptor to the vhost device:
>>>
>>> If guest virtio code mask the virtqueue and replaces the vhost-vdpa
>>> device call fd (VhostShadowVirtqueue.call_notifier in the next
commit,
>>> or the descriptor in your previous second point, svq callfd) with
the
>>> masked notifier, vhost_shadow_vq_handle_call will not be called
>>> anymore, and no more used descriptors will be forwarded. They will
be
>>> stuck if the shadow virtqueue forever. Guest itself cannot recover
>>> from this situation, since a masking will set irqfd, not SVQ call
fd.
>>
>> Just to make sure we're in the same page. During vq masking, the
virtio
>> codes actually use the masked_notifier as callfd in
vhost_virtqueue_mask():
>>
>>       if (mask) {
>>           assert(vdev->use_guest_notifier_mask);
>>           file.fd =
event_notifier_get_fd(&hdev->vqs[index].masked_notifier);
>>       } else {
>>       file.fd =
event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq));
>>       }
>>
>>       file.index = hdev->vhost_ops->vhost_get_vq_index(hdev, n);
>>       r = hdev->vhost_ops->vhost_set_vring_call(hdev, &file);
>>
>> So consider the shadow virtqueue in done at vhost-vDPA. We just need to
>> make sure
>>
>> 1) update the callfd which passed by virtio layer via set_vring_kick()
>> 2) always write to the callfd during vhost_shadow_vq_handle_call()
>>
>> Then
>>
>> 3) When shadow vq is enabled, we just set the callfd of shadow
virtqueue
>> to vDPA via VHOST_SET_VRING_CALL, and poll the svq callfd
>> 4) When shadow vq is disabled, we just set the callfd that is passed by
>> virtio via VHOST_SET_VRING_CALL, stop poll the svq callfd
>>
>> So you can see in step 2 and 4, we don't need to know whether or
not the
>> vq is masked since we follow the vhost protocol "VhostOps"
and do
>> everyhing transparently in the vhost-(vDPA) layer.
>>
> All of this assumes that we can enable/disable SVQ dynamically while
> the device is running. If it's not the case, there is no need for the
> mutex neither in vhost.c code nor vdpa_backend.
>
> As I see it, the issue is that step (2) and (4) happens in different
> threads: (2) is in vCPU vmexit, and (4) is in main event loop.
> Consider unmasking and disabling SVQ at the same time with no mutex:
>
> vCPU vmexit thread                     aio thread
> (unmask)                               (stops SVQ)
> |                                      |
> |                                      // Last callfd set was
masked_notifier
> |                                      vdpa_backend.callfd = \
> |                                             
atomic_read(masked_notifier).
> |                                      |
> vhost_set_vring_call(vq.guest_notifier)|
> -> vdpa_backend.callfd = \             |
>             vq.guest_notifier           |
> |                                      |
> |                                      ioctl(vdpa,
> VHOST_SET_VRING_CALL, vdpa_backend.callfd)
> |
> // guest expects more interrupts, but
> // device just set masked
>
> And vhost_set_vring_call could happen entirely even while ioctl is
> being executed.
>
> So that is the reason for the mutex: vdpa_backend.call_fd and the
> ioctl VHOST_SET_VRING_CALL must be serialized. I'm ok with moving to
> vdpa backend, but it's the same code, just in vdpa_backend.c instead
> of vhost.c, so it becomes less generic in my opinion.

You are right. But let's consider if we can avoid the dedicated mutex.

E.g can we use the BQL, bascially we need to synchronizae with iothread.

Or is it possible to schedule bh then things are serailzied automatically?

>
>>>> 2) easily to deal with vhost dev start and stop
>>>>
>>>> The advantages are obvious, simple and easy to implement.
>>>>
>>> I still don't see how performing this step from backend code
avoids
>>> the synchronization problem, since they will be done from different
>>> threads anyway. Not sure what piece I'm missing.
>>
>> See my reply in another thread. If you enable the shadow virtqueue via
a
>> OOB monitor that's a real issue.
>>
>> But I don't think we need to do that since
>>
>> 1) SVQ should be transparnet to management
>> 2) unncessary synchronization issue
>>
>> We can enable the shadow virtqueue through cli, new parameter with
>> vhost-vdpa probably. Then we don't need to care about threads. And
in
>> the final version with full live migration support, the shadow
virtqueue
>> should be enabled automatically. E.g for the device without
>> VHOST_F_LOG_ALL or we can have a dedicated capability of vDPA via
>> VHOST_GET_BACKEND_FEATURES.
>>
> It should be enabled automatically in those condition, but it also
> needs to be dynamic, and only be active during migration. Otherwise,
> guest should use regular vdpa operation. The problem with masking is
> the same if we enable with QMP or because live migration event.
>
> So we will have the previous synchronization problem sooner or later.
> If we omit the rollback to regular vdpa operation (in other words,
> disabling SVQ), code can be simplified, but I'm not sure if that is
> desirable.

Rgiht, so I'm ok to have the synchronziation from the start if you wish.

But we need to figure out what to synchronize and how to do synchronize.

THanks

>
> Thanks!
>
>> Thanks
>>
>>
>>> I can see / tested a few solutions but I don't like them a lot:
>>>
>>> * Forbid hot-swapping from/to shadow virtqueue mode, and set it
from
>>> cmdline: We will have to deal with setting the SVQ mode dynamically
>>> sooner or later if we want to use it for live migration.
>>> * Forbid coming back to regular mode after switching to shadow
>>> virtqueue mode: The heavy part of the synchronization comes from
svq
>>> stopping code, since we need to serialize the setting of device
call
>>> fd. This could be acceptable, but I'm not sure about the
implications:
>>> What happens if live migration fails and we need to step back? A
mutex
>>> is not needed in this scenario, it's ok with atomics and RCU
code.
>>>
>>> * Replace KVM_IRQFD instead and let SVQ poll the old one and masked
>>> notifier: I haven't thought a lot of this one, I think it's
better to
>>> not touch guest notifiers.
>>> * Monitor also masked notifier from SVQ: I think this could be
>>> promising, but SVQ needs to be notified about masking/unmasking
>>> anyway, and there is code that depends on checking the masked
notifier
>>> for the pending notification.
>>>
>>>>>> 2) vhost dev start and stop
>>>>>>
>>>>>> ?
>>>>>>
>>>>>>
>>>>>>> +     *
>>>>>>> +     * So shadow virtqueue must not clean it, or
we would lose VirtQueue one.
>>>>>>> +     */
>>>>>>> +    EventNotifier host_notifier;
>>>>>>> +
>>>>>>> +    /* Virtio queue shadowing */
>>>>>>> +    VirtQueue *vq;
>>>>>>>      } VhostShadowVirtqueue;
>>>>>>>
>>>>>>> +/* Forward guest notifications */
>>>>>>> +static void vhost_handle_guest_kick(EventNotifier
*n)
>>>>>>> +{
>>>>>>> +    VhostShadowVirtqueue *svq = container_of(n,
VhostShadowVirtqueue,
>>>>>>> +                                            
host_notifier);
>>>>>>> +
>>>>>>> +    if
(unlikely(!event_notifier_test_and_clear(n))) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +   
event_notifier_set(&svq->kick_notifier);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Restore the vhost guest to host notifier, i.e.,
disables svq effect.
>>>>>>> + */
>>>>>>> +static int
vhost_shadow_vq_restore_vdev_host_notifier(struct vhost_dev *dev,
>>>>>>> +                                                  
unsigned vhost_index,
>>>>>>> +                                                  
VhostShadowVirtqueue *svq)
>>>>>>> +{
>>>>>>> +    EventNotifier *vq_host_notifier =
virtio_queue_get_host_notifier(svq->vq);
>>>>>>> +    struct vhost_vring_file file = {
>>>>>>> +        .index = vhost_index,
>>>>>>> +        .fd =
event_notifier_get_fd(vq_host_notifier),
>>>>>>> +    };
>>>>>>> +    int r;
>>>>>>> +
>>>>>>> +    /* Restore vhost kick */
>>>>>>> +    r =
dev->vhost_ops->vhost_set_vring_kick(dev, &file);
>>>>>>> +    return r ? -errno : 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Start shadow virtqueue operation.
>>>>>>> + * @dev vhost device
>>>>>>> + * @hidx vhost virtqueue index
>>>>>>> + * @svq Shadow Virtqueue
>>>>>>> + */
>>>>>>> +bool vhost_shadow_vq_start(struct vhost_dev *dev,
>>>>>>> +                           unsigned idx,
>>>>>>> +                           VhostShadowVirtqueue
*svq)
>>>>>> It looks to me this assumes the vhost_dev is started
before
>>>>>> vhost_shadow_vq_start()?
>>>>>>
>>>>> Right.
>>>> This might not true. Guest may enable and disable virtio
drivers after
>>>> the shadow virtqueue is started. You need to deal with that.
>>>>
>>> Right, I will test this scenario.
>>>
>>>> Thanks
>>>>
Virtualization - Mar 2021 - [RFC v2 05/13] vhost: Route guest->host notification through shadow virtqueue

[RFC v2 05/13] vhost: Route guest->host notification through shadow virtqueue