On Tue, Mar 8, 2022 at 3:55 PM Michael S. Tsirkin <mst at redhat.com>
wrote:>
> On Tue, Mar 08, 2022 at 03:34:17PM +0800, Jason Wang wrote:
> > On Tue, Mar 8, 2022 at 3:28 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >
> > > On Tue, Mar 08, 2022 at 03:14:35PM +0800, Jason Wang wrote:
> > > > On Tue, Mar 8, 2022 at 3:11 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > > > >
> > > > > On Tue, Mar 08, 2022 at 02:03:32PM +0800, Jason Wang
wrote:
> > > > > >
> > > > > > ? 2022/3/7 ??11:33, Eugenio P?rez ??:
> > > > > > > This series enable shadow virtqueue (SVQ) for
vhost-vdpa devices. This
> > > > > > > is intended as a new method of tracking the
memory the devices touch
> > > > > > > during a migration process: Instead of relay
on vhost device's dirty
> > > > > > > logging capability, SVQ intercepts the VQ
dataplane forwarding the
> > > > > > > descriptors between VM and device. This way
qemu is the effective
> > > > > > > writer of guests memory, like in qemu's
virtio device operation.
> > > > > > >
> > > > > > > When SVQ is enabled qemu offers a new virtual
address space to the
> > > > > > > device to read and write into, and it maps
new vrings and the guest
> > > > > > > memory in it. SVQ also intercepts kicks and
calls between the device
> > > > > > > and the guest. Used buffers relay would cause
dirty memory being
> > > > > > > tracked.
> > > > > > >
> > > > > > > This effectively means that vDPA device
passthrough is intercepted by
> > > > > > > qemu. While SVQ should only be enabled at
migration time, the switching
> > > > > > > from regular mode to SVQ mode is left for a
future series.
> > > > > > >
> > > > > > > It is based on the ideas of DPDK SW assisted
LM, in the series of
> > > > > > > DPDK's
https://patchwork.dpdk.org/cover/48370/ . However, these does
> > > > > > > not map the shadow vq in guest's VA, but
in qemu's.
> > > > > > >
> > > > > > > For qemu to use shadow virtqueues the guest
virtio driver must not use
> > > > > > > features like event_idx.
> > > > > > >
> > > > > > > SVQ needs to be enabled with cmdline:
> > > > > > >
> > > > > > > -netdev
type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,svq=on
> > > > >
> > > > > A stable API for an incomplete feature is a problem
imho.
> > > >
> > > > It should be "x-svq".
> > >
> > >
> > > Well look at patch 15.
> >
> > It's a bug that needs to be fixed.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The first three patches enables notifications
forwarding with
> > > > > > > assistance of qemu. It's easy to enable
only this if the relevant
> > > > > > > cmdline part of the last patch is applied on
top of these.
> > > > > > >
> > > > > > > Next four patches implement the actual buffer
forwarding. However,
> > > > > > > address are not translated from HVA so they
will need a host device with
> > > > > > > an iommu allowing them to access all of the
HVA range.
> > > > > > >
> > > > > > > The last part of the series uses properly the
host iommu, so qemu
> > > > > > > creates a new iova address space in the
device's range and translates
> > > > > > > the buffers in it. Finally, it adds the
cmdline parameter.
> > > > > > >
> > > > > > > Some simple performance tests with netperf
were done. They used a nested
> > > > > > > guest with vp_vdpa, vhost-kernel at L0 host.
Starting with no svq and a
> > > > > > > baseline average of ~9009.96Mbps:
> > > > > > > Recv Send Send
> > > > > > > Socket Socket Message Elapsed
> > > > > > > Size Size Size Time Throughput
> > > > > > > bytes bytes bytes secs. 10^6bits/sec
> > > > > > > 131072 16384 16384 30.01 9061.03
> > > > > > > 131072 16384 16384 30.01 8962.94
> > > > > > > 131072 16384 16384 30.01 9005.92
> > > > > > >
> > > > > > > To enable SVQ buffers forwarding reduce
throughput to about
> > > > > > > Recv Send Send
> > > > > > > Socket Socket Message Elapsed
> > > > > > > Size Size Size Time Throughput
> > > > > > > bytes bytes bytes secs. 10^6bits/sec
> > > > > > > 131072 16384 16384 30.01 7689.72
> > > > > > > 131072 16384 16384 30.00 7752.07
> > > > > > > 131072 16384 16384 30.01 7750.30
> > > > > > >
> > > > > > > However, many performance improvements were
left out of this series for
> > > > > > > simplicity, so difference should shrink in
the future.
> > > > > > >
> > > > > > > Comments are welcome.
> > > > > >
> > > > > >
> > > > > > Hi Michael:
> > > > > >
> > > > > > What do you think of this series? It looks good to
me as a start. The
> > > > > > feature could only be enabled as a dedicated
parameter. If you're ok, I'd
> > > > > > try to make it for 7.0.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Well that's cutting it awfully close, and it's
not really useful
> > > > > at the current stage, is it?
> > > >
> > > > This allows vDPA to be migrated when using
"x-svq=on".
> > > > But anyhow it's
> > > > experimental.
> > >
> > > it's less experimental than incomplete. It seems pretty
clearly not
> > > the way it will work down the road, we don't want svq
involved
> > > at all times.
> >
> > Right, but SVQ could be used for other places e.g providing migration
> > compatibility when the destination lacks some features.
>
> In its current form? I don't see how. Generally?
Generally, yes.
> Maybe but I suspect
> we'll have to rework it completely for that.
Probably not, from what I see, it just needs some extension of the current code.
>
> > >
> > > > >
> > > > > The IOVA trick does not feel complete either.
> > > >
> > > > I don't get here. We don't use any IOVA trick as
DPDK (it reserve IOVA
> > > > for shadow vq) did. So we won't suffer from the issues
of DPDK.
> > > >
> > > > Thanks
> > >
> > > Maybe I misundrstand how this all works.
> > > I refer to all the iova_tree_alloc_map things.
> >
> > It's a simple IOVA allocater actually. Anything wrong with that?
>
> Not by itself but I'm not sure we can guarantee guest will not
> attempt to use the IOVA addresses we are reserving down
> the road.
The IOVA is allocated via the listeners and stored in the iova tree
per GPA range as IOVA->(GPA)->HVA.Guests will only see GPA, Qemu
virtio core see GPA to HVA mapping. And we do a reverse lookup to find
the HVA->IOVA we allocated previously. So we have double check here:
1) Qemu memory core to make sure the GPA that guest uses is valid
2) the IOVA tree that guarantees there will be no HVA beyond what
guest can see is used
So technically, there's no way for the guest to use the IOVA address
allocated for the shadow virtqueue.
Thanks
>
> > I'm fine with making it for the future release.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO on future series:
> > > > > > > * Event, indirect, packed, and others
features of virtio.
> > > > > > > * To support different set of features
between the device<->SVQ and the
> > > > > > > SVQ<->guest communication.
> > > > > > > * Support of device host notifier memory
regions.
> > > > > > > * To sepparate buffers forwarding in its own
AIO context, so we can
> > > > > > > throw more threads to that task and we
don't need to stop the main
> > > > > > > event loop.
> > > > > > > * Support multiqueue virtio-net vdpa.
> > > > > > > * Proper documentation.
> > > > > > >
> > > > > > > Changes from v4:
> > > > > > > * Iterate iova->hva tree instead on
maintain own tree so we support HVA
> > > > > > > overlaps.
> > > > > > > * Fix: Errno completion at failure.
> > > > > > > * Rename x-svq to svq, so changes to stable
does not affect cmdline parameter.
> > > > > > >
> > > > > > > Changes from v3:
> > > > > > > * Add @unstable feature to
NetdevVhostVDPAOptions.x-svq.
> > > > > > > * Fix uncomplete mapping (by 1 byte) of
memory regions if svq is enabled.
> > > > > > > v3 link:
> > > > > > >
https://lore.kernel.org/qemu-devel/20220302203012.3476835-1-eperezma at
redhat.com/
> > > > > > >
> > > > > > > Changes from v2:
> > > > > > > * Less assertions and more error handling in
iova tree code.
> > > > > > > * Better documentation, both fixing errors
and making @param: format
> > > > > > > * Homogeneize SVQ avail_idx_shadow and
shadow_used_idx to make shadow a
> > > > > > > prefix at both times.
> > > > > > > * Fix: Fo not use VirtQueueElement->len
field, track separatedly.
> > > > > > > * Split
vhost_svq_{enable,disable}_notification, so the code looks more
> > > > > > > like the kernel driver code.
> > > > > > > * Small improvements.
> > > > > > > v2 link:
> > > > > > >
https://lore.kernel.org/all/CAJaqyWfXHE0C54R_-OiwJzjC0gPpkE3eX0L8BeeZXGm1ERYPtA
at mail.gmail.com/
> > > > > > >
> > > > > > > Changes from v1:
> > > > > > > * Feature set at device->SVQ is now the
same as SVQ->guest.
> > > > > > > * Size of SVQ is not max available device
size anymore, but guest's
> > > > > > > negotiated.
> > > > > > > * Add VHOST_FILE_UNBIND kick and call fd
treatment.
> > > > > > > * Make SVQ a public struct
> > > > > > > * Come back to previous approach to iova-tree
> > > > > > > * Some assertions are now fail paths. Some
errors are now log_guest.
> > > > > > > * Only mask _F_LOG feature at
vdpa_set_features svq enable path.
> > > > > > > * Refactor some errors and messages. Add
missing error unwindings.
> > > > > > > * Add memory barrier at _F_NO_NOTIFY set.
> > > > > > > * Stop checking for features flags out of
transport range.
> > > > > > > v1 link:
> > > > > > >
https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3201 at
redhat.com/
> > > > > > >
> > > > > > > Changes from v4 RFC:
> > > > > > > * Support of allocating / freeing iova ranges
in IOVA tree. Extending
> > > > > > > already present iova-tree for that.
> > > > > > > * Proper validation of guest features. Now
SVQ can negotiate a
> > > > > > > different set of features with the device
when enabled.
> > > > > > > * Support of host notifiers memory regions
> > > > > > > * Handling of SVQ full queue in case
guest's descriptors span to
> > > > > > > different memory regions (qemu's VA
chunks).
> > > > > > > * Flush pending used buffers at end of SVQ
operation.
> > > > > > > * QMP command now looks by NetClientState
name. Other devices will need
> > > > > > > to implement it's way to enable vdpa.
> > > > > > > * Rename QMP command to set, so it looks more
like a way of working
> > > > > > > * Better use of qemu error system
> > > > > > > * Make a few assertions proper error-handling
paths.
> > > > > > > * Add more documentation
> > > > > > > * Less coupling of virtio / vhost, that could
cause friction on changes
> > > > > > > * Addressed many other small comments and
small fixes.
> > > > > > >
> > > > > > > Changes from v3 RFC:
> > > > > > > * Move everything to vhost-vdpa backend. A
big change, this allowed
> > > > > > > some cleanup but more code has been
added in other places.
> > > > > > > * More use of glib utilities, especially
to manage memory.
> > > > > > > v3 link:
> > > > > > >
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
> > > > > > >
> > > > > > > Changes from v2 RFC:
> > > > > > > * Adding vhost-vdpa devices support
> > > > > > > * Fixed some memory leaks pointed by
different comments
> > > > > > > v2 link:
> > > > > > >
https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
> > > > > > >
> > > > > > > Changes from v1 RFC:
> > > > > > > * Use QMP instead of migration to start
SVQ mode.
> > > > > > > * Only accepting IOMMU devices, closer
behavior with target devices
> > > > > > > (vDPA)
> > > > > > > * Fix invalid masking/unmasking of vhost
call fd.
> > > > > > > * Use of proper methods for
synchronization.
> > > > > > > * No need to modify VirtIO device code,
all of the changes are
> > > > > > > contained in vhost code.
> > > > > > > * Delete superfluous code.
> > > > > > > * An intermediate RFC was sent with only
the notifications forwarding
> > > > > > > changes. It can be seen in
> > > > > > >
https://patchew.org/QEMU/20210129205415.876290-1-eperezma at redhat.com/
> > > > > > > v1 link:
> > > > > > >
https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
> > > > > > >
> > > > > > > Eugenio P?rez (20):
> > > > > > > virtio: Add VIRTIO_F_QUEUE_STATE
> > > > > > > virtio-net: Honor
VIRTIO_CONFIG_S_DEVICE_STOPPED
> > > > > > > virtio: Add
virtio_queue_is_host_notifier_enabled
> > > > > > > vhost: Make
vhost_virtqueue_{start,stop} public
> > > > > > > vhost: Add x-vhost-enable-shadow-vq
qmp
> > > > > > > vhost: Add VhostShadowVirtqueue
> > > > > > > vdpa: Register vdpa devices in a list
> > > > > > > vhost: Route guest->host
notification through shadow virtqueue
> > > > > > > Add vhost_svq_get_svq_call_notifier
> > > > > > > Add vhost_svq_set_guest_call_notifier
> > > > > > > vdpa: Save call_fd in vhost-vdpa
> > > > > > > vhost-vdpa: Take into account SVQ in
vhost_vdpa_set_vring_call
> > > > > > > vhost: Route host->guest
notification through shadow virtqueue
> > > > > > > virtio: Add
vhost_shadow_vq_get_vring_addr
> > > > > > > vdpa: Save host and guest features
> > > > > > > vhost: Add
vhost_svq_valid_device_features to shadow vq
> > > > > > > vhost: Shadow virtqueue buffers
forwarding
> > > > > > > vhost: Add VhostIOVATree
> > > > > > > vhost: Use a tree to store memory
mappings
> > > > > > > vdpa: Add custom IOTLB translations to
SVQ
> > > > > > >
> > > > > > > Eugenio P?rez (15):
> > > > > > > vhost: Add VhostShadowVirtqueue
> > > > > > > vhost: Add Shadow VirtQueue kick
forwarding capabilities
> > > > > > > vhost: Add Shadow VirtQueue call
forwarding capabilities
> > > > > > > vhost: Add vhost_svq_valid_features to
shadow vq
> > > > > > > virtio: Add vhost_svq_get_vring_addr
> > > > > > > vdpa: adapt vhost_ops callbacks to svq
> > > > > > > vhost: Shadow virtqueue buffers forwarding
> > > > > > > util: Add iova_tree_alloc_map
> > > > > > > util: add iova_tree_find_iova
> > > > > > > vhost: Add VhostIOVATree
> > > > > > > vdpa: Add custom IOTLB translations to SVQ
> > > > > > > vdpa: Adapt vhost_vdpa_get_vring_base to
SVQ
> > > > > > > vdpa: Never set log_base addr if SVQ is
enabled
> > > > > > > vdpa: Expose VHOST_F_LOG_ALL on SVQ
> > > > > > > vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >
> > > > > > > qapi/net.json | 8 +-
> > > > > > > hw/virtio/vhost-iova-tree.h | 27 ++
> > > > > > > hw/virtio/vhost-shadow-virtqueue.h | 87
++++
> > > > > > > include/hw/virtio/vhost-vdpa.h | 8 +
> > > > > > > include/qemu/iova-tree.h | 38 +-
> > > > > > > hw/virtio/vhost-iova-tree.c | 110
+++++
> > > > > > > hw/virtio/vhost-shadow-virtqueue.c | 637
+++++++++++++++++++++++++++++
> > > > > > > hw/virtio/vhost-vdpa.c | 525
+++++++++++++++++++++++-
> > > > > > > net/vhost-vdpa.c | 48
++-
> > > > > > > util/iova-tree.c | 169
++++++++
> > > > > > > hw/virtio/meson.build | 2 +-
> > > > > > > 11 files changed, 1633 insertions(+), 26
deletions(-)
> > > > > > > create mode 100644
hw/virtio/vhost-iova-tree.h
> > > > > > > create mode 100644
hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > create mode 100644
hw/virtio/vhost-iova-tree.c
> > > > > > > create mode 100644
hw/virtio/vhost-shadow-virtqueue.c
> > > > > > >
> > > > >
> > >
>