thr3ads.net - Virtualization - [PATCH 17/31] vdpa: adapt vhost

If this information is useful, please help other people find it:
Share via:
Jason Wang
2022-Feb-22 07:59 UTC
[PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq

On Tue, Feb 22, 2022 at 3:43 PM Eugenio Perez Martin
<eperezma at redhat.com> wrote:>
> On Tue, Feb 22, 2022 at 4:16 AM Jason Wang <jasowang at redhat.com>
wrote:
> >
> > On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
> > <eperezma at redhat.com> wrote:
> > >
> > > On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang at
redhat.com> wrote:
> > > >
> > > >
> > > > ? 2022/2/18 ??1:13, Eugenio Perez Martin ??:
> > > > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang
at redhat.com> wrote:
> > > > >>
> > > > >> ? 2022/2/1 ??2:58, Eugenio Perez Martin ??:
> > > > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang
<jasowang at redhat.com> wrote:
> > > > >>>> ? 2022/1/22 ??4:27, Eugenio P?rez ??:
> > > > >>>>> First half of the buffers forwarding
part, preparing vhost-vdpa
> > > > >>>>> callbacks to SVQ to offer it. QEMU
cannot enable it at this moment, so
> > > > >>>>> this is effectively dead code at the
moment, but it helps to reduce
> > > > >>>>> patch size.
> > > > >>>>>
> > > > >>>>> Signed-off-by: Eugenio P?rez
<eperezma at redhat.com>
> > > > >>>>> ---
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.h
|   2 +-
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.c
|  21 ++++-
> > > > >>>>>     hw/virtio/vhost-vdpa.c            
| 133 ++++++++++++++++++++++++++---
> > > > >>>>>     3 files changed, 143 insertions(+),
13 deletions(-)
> > > > >>>>>
> > > > >>>>> diff --git
a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> index 035207a469..39aef5ffdf 100644
> > > > >>>>> ---
a/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> +++
b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> @@ -35,7 +35,7 @@ size_t
vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>>     void
vhost_svq_stop(VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>> -VhostShadowVirtqueue
*vhost_svq_new(void);
> > > > >>>>> +VhostShadowVirtqueue
*vhost_svq_new(uint16_t qsize);
> > > > >>>>>
> > > > >>>>>     void
vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > >>>>>
> > > > >>>>> diff --git
a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> index f129ec8395..7c168075d7 100644
> > > > >>>>> ---
a/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> +++
b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> @@ -277,9 +277,17 @@ void
vhost_svq_stop(VhostShadowVirtqueue *svq)
> > > > >>>>>     /**
> > > > >>>>>      * Creates vhost shadow virtqueue,
and instruct vhost device to use the shadow
> > > > >>>>>      * methods and file descriptors.
> > > > >>>>> + *
> > > > >>>>> + * @qsize Shadow VirtQueue size
> > > > >>>>> + *
> > > > >>>>> + * Returns the new virtqueue or NULL.
> > > > >>>>> + *
> > > > >>>>> + * In case of error, reason is
reported through error_report.
> > > > >>>>>      */
> > > > >>>>> -VhostShadowVirtqueue
*vhost_svq_new(void)
> > > > >>>>> +VhostShadowVirtqueue
*vhost_svq_new(uint16_t qsize)
> > > > >>>>>     {
> > > > >>>>> +    size_t desc_size =
sizeof(vring_desc_t) * qsize;
> > > > >>>>> +    size_t device_size, driver_size;
> > > > >>>>>         g_autofree VhostShadowVirtqueue
*svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >>>>>         int r;
> > > > >>>>>
> > > > >>>>> @@ -300,6 +308,15 @@
VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>>         /* Placeholder descriptor, it
should be deleted at set_kick_fd */
> > > > >>>>>        
event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > > > >>>>>
> > > > >>>>> +    svq->vring.num = qsize;
> > > > >>>> I wonder if this is the best. E.g some
hardware can support up to 32K
> > > > >>>> queue size. So this will probably end up
with:
> > > > >>>>
> > > > >>>> 1) SVQ use 32K queue size
> > > > >>>> 2) hardware queue uses 256
> > > > >>>>
> > > > >>> In that case SVQ vring queue size will be 32K
and guest's vring can
> > > > >>> negotiate any number with SVQ equal or less
than 32K,
> > > > >>
> > > > >> Sorry for being unclear what I meant is actually
> > > > >>
> > > > >> 1) SVQ uses 32K queue size
> > > > >>
> > > > >> 2) guest vq uses 256
> > > > >>
> > > > >> This looks like a burden that needs extra logic and
may damage the
> > > > >> performance.
> > > > >>
> > > > > Still not getting this point.
> > > > >
> > > > > An available guest buffer, although contiguous in
GPA/GVA, can expand
> > > > > in multiple buffers if it's not contiguous in
qemu's VA (by the while
> > > > > loop in virtqueue_map_desc [1]). In that scenario it is
better to have
> > > > > "plenty" of SVQ buffers.
> > > >
> > > >
> > > > Yes, but this case should be rare. So in this case we should
deal with
> > > > overrun on SVQ, that is
> > > >
> > > > 1) SVQ is full
> > > > 2) guest VQ isn't
> > > >
> > > > We need to
> > > >
> > > > 1) check the available buffer slots
> > > > 2) disable guest kick and wait for the used buffers
> > > >
> > > > But it looks to me the current code is not ready for dealing
with this case?
> > > >
> > >
> > > Yes it deals, that's the meaning of
svq->next_guest_avail_elem.
> >
> > Oh right, I missed that.
> >
> > >
> > > >
> > > > >
> > > > > I'm ok if we decide to put an upper limit though,
or if we decide not
> > > > > to handle this situation. But we would leave out valid
virtio drivers.
> > > > > Maybe to set a fixed upper limit (1024?)? To add
another parameter
> > > > > (x-svq-size-n=N)?
> > > > >
> > > > > If you mean we lose performance because memory gets
more sparse I
> > > > > think the only possibility is to limit that way.
> > > >
> > > >
> > > > If guest is not using 32K, having a 32K for svq may gives
extra stress
> > > > on the cache since we will end up with a pretty large
working set.
> > > >
> > >
> > > That might be true. My guess is that it should not matter, since
SVQ
> > > and the guest's vring will have the same numbers of scattered
buffers
> > > and the avail / used / packed ring will be consumed more or less
> > > sequentially. But I haven't tested.
> > >
> > > I think it's better to add an upper limit (either fixed or in
the
> > > qemu's backend's cmdline) later if we see that this is a
problem.
> >
> > I'd suggest using the same size as what the guest saw.
> >
> > > Another solution now would be to get the number from the frontend
> > > device cmdline instead of from the vdpa device. I'm ok with
that, but
> > > it doesn't delete the svq->next_guest_avail_elem
processing, and it
> > > comes with disadvantages in my opinion. More below.
> >
> > Right, we should keep next_guest_avail_elem. Using the same queue size
> > is a balance between:
> >
> > 1) using next_guest_avail_elem (rare)
> > 2) not give too much stress on the cache
> >
>
> Ok I'll change the SVQ size for the frontend size then.
>
> > >
> > > >
> > > > >
> > > > >> And this can lead other interesting situation:
> > > > >>
> > > > >> 1) SVQ uses 256
> > > > >>
> > > > >> 2) guest vq uses 1024
> > > > >>
> > > > >> Where a lot of more SVQ logic is needed.
> > > > >>
> > > > > If we agree that a guest descriptor can expand in
multiple SVQ
> > > > > descriptors, this should be already handled by the
previous logic too.
> > > > >
> > > > > But this should only happen in case that qemu is
launched with a "bad"
> > > > > cmdline, isn't it?
> > > >
> > > >
> > > > This seems can happen when we use -device
> > > > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa
device at least?
> > > >
> > >
> > > I'm going to use the rx queue here since it's more
accurate, tx has
> > > its own limit separately.
> > >
> > > If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1
with no
> > > SVQ, L0 qemu will happily accept 1024 as size
> >
> > Interesting, looks like a bug (I guess it works since you enable
vhost?):
> >
>
> No, emulated interfaces. More below.
>
> > Per virtio-spec:
> >
> > """
> > Queue Size. On reset, specifies the maximum queue size supported by
> > the device. This can be modified by the driver to reduce memory
> > requirements. A 0 means the queue is unavailable.
> > """
> >
>
> Yes but how should it fail? Drivers do not know how to check if the
> value was invalid. DEVICE_NEEDS_RESET?
I think it can be detected by reading the value back to see if it matches.

Thanks
>
> The L0 emulated device simply receives the write to pci and calls
> virtio_queue_set_num. I can try to add to the check "num <
> vdev->vq[n].vring.num_default", but there is no way to notify the
> guest that setting the value failed.
>
> > We can't increase the queue_size from 256 to 1024 actually. (Only
> > decrease is allowed).
> >
> > > when L1 qemu writes that
> > > value at vhost_virtqueue_start. I'm not sure what would
happen with a
> > > real device, my guess is that the device will fail somehow.
That's
> > > what I meant with a "bad cmdline", I should have been
more specific.
> >
> > I should say that it's something that is probably unrelated to
this
> > series but needs to be addressed.
> >
>
> I agree, I can start developing the patches for sure.
>
> > >
> > > If we add SVQ to the mix, the guest first negotiates the 1024
with the
> > > qemu device model. After that, vhost.c will try to write 1024 too
but
> > > this is totally ignored by this patch's changes at
> > > vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring
size to
> > > the device, since it's the read value from the device,
leading to your
> > > scenario. So SVQ effectively isolates both sides and makes
possible
> > > the communication, even with a device that does not support so
many
> > > descriptors.
> > >
> > > But SVQ already handles this case: It's the same as if the
buffers are
> > > fragmented in HVA and queue size is equal at both sides.
That's why I
> > > think SVQ size should depend on the backend device's size,
not
> > > frontend cmdline.
> >
> > Right.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > >
> > > > >
> > > > > If I run that example with vp_vdpa, L0 qemu will
happily accept 1024
> > > > > as a queue size [2]. But if the vdpa device maximum
queue size is
> > > > > effectively 256, this will result in an error:
We're not exposing it
> > > > > to the guest at any moment but with qemu's cmdline.
> > > > >
> > > > >>> including 256.
> > > > >>> Is that what you mean?
> > > > >>
> > > > >> I mean, it looks to me the logic will be much more
simplified if we just
> > > > >> allocate the shadow virtqueue with the size what
guest can see (guest
> > > > >> vring).
> > > > >>
> > > > >> Then we don't need to think if the difference
of the queue size can have
> > > > >> any side effects.
> > > > >>
> > > > > I think that we cannot avoid that extra logic unless we
force GPA to
> > > > > be contiguous in IOVA. If we are sure the guest's
buffers cannot be at
> > > > > more than one descriptor in SVQ, then yes, we can
simplify things. If
> > > > > not, I think we are forced to carry all of it.
> > > >
> > > >
> > > > Yes, I agree, the code should be robust to handle any case.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > But if we prove it I'm not opposed to simplifying
things and making
> > > > > head at SVQ == head at guest.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > [1]
https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > > > [2] But that's not the whole story: I've been
running limited in tx
> > > > > descriptors because of virtio_net_max_tx_queue_size,
which predates
> > > > > vdpa. I'll send a patch to also un-limit it.
> > > > >
> > > > >>> If with hardware queues you mean guest's
vring, not sure why it is
> > > > >>> "probably 256". I'd say that in
that case with the virtio-net kernel
> > > > >>> driver the ring size will be the same as the
device export, for
> > > > >>> example, isn't it?
> > > > >>>
> > > > >>> The implementation should support any
combination of sizes, but the
> > > > >>> ring size exposed to the guest is never bigger
than hardware one.
> > > > >>>
> > > > >>>> ? Or we SVQ can stick to 256 but this will
this cause trouble if we want
> > > > >>>> to add event index support?
> > > > >>>>
> > > > >>> I think we should not have any problem with
event idx. If you mean
> > > > >>> that the guest could mark more buffers
available than SVQ vring's
> > > > >>> size, that should not happen because there must
be less entries in the
> > > > >>> guest than SVQ.
> > > > >>>
> > > > >>> But if I understood you correctly, a similar
situation could happen if
> > > > >>> a guest's contiguous buffer is scattered
across many qemu's VA chunks.
> > > > >>> Even if that would happen, the situation should
be ok too: SVQ knows
> > > > >>> the guest's avail idx and, if SVQ is full,
it will continue forwarding
> > > > >>> avail buffers when the device uses more
buffers.
> > > > >>>
> > > > >>> Does that make sense to you?
> > > > >>
> > > > >> Yes.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > >
> > >
> >
>
Virtualization - Feb 2022 - [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq

[PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq