> From: Eugenio Perez Martin <eperezma at redhat.com>
> Sent: Wednesday, May 11, 2022 3:44 PM
>
> This is a proposal to restore the state of the vhost-vdpa device at the
> destination after a live migration. It uses as many available features both
> from the device and from qemu as possible so we keep the communication
> simple and speed up the merging process.
>
> # Initializing a vhost-vdpa device.
>
> Without the context of live migration, the steps to initialize the device
from
> vhost-vdpa at qemu starting are:
> 1) [vhost] Open the vdpa device, Using simply open()
> 2) [vhost+virtio] Get device features. These are expected not to change in
> the device's lifetime, so we can save them. Qemu issues a
> VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using
> get_device_features() callback.
> 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ.
This should be soon replaced with more generic num_vq interface as
max_queue_pairs don?t, work beyond net.
There is no need to continue some ancient interface way for newly built vdpa
stack.
> These are obtained using VHOST_VDPA_GET_CONFIG, and that request is
> forwarded to the device using get_config. QEMU expects the device to not
> change it in its lifetime.
> 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no
> FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and
> the vdpa backend driver callback is set_status.
>
> These are the steps used to initialize the device in qemu terminology,
taking
> away some redundancies to make it simpler.
>
> Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu
> detects it, so it *starts* the device.
>
> # Starting a vhost-vdpa device
>
> At virtio_net_vhost_status we have two important variables here:
> int cvq = _F_CTRL_VQ ? 1 : 0;
> int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3)
:
> 0;
>
> Now identification of the cvq index. Qemu *know* that the device will
> expose it at the last queue (max_queue_pairs*2) if _F_MQ has been
> acknowledged by the guest's driver or 2 if not. It cannot depend on any
data
> sent to the device via cvq, because we couldn't get its command status
on a
> change.
>
> Now we start the vhost device. The workflow is currently:
>
> 5) [virtio+vhost] The first step is to send the acknowledgement of the
Virtio
> features and vhost/vdpa backend features to the device, so it knows how to
> configure itself. This is done using the same calls as step 4 with these
feature
> bits added.
> 6) [virtio] Set the size, base, addr, kick and call fd for each queue
> (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with
> set_vq_address, set_vq_state, ...)
> 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each
> queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and
> forwarded to the vdpa backend using set_vq_ready callback.
> 8) [virtio + vdpa] Send memory translations & set DRIVER_OK.
>
So MQ all VQs setup should be set before step_8.
> If we follow the current workflow, the device is allowed now to start
> receiving only on vq pair 0, since we've still not set the multi queue
pair. This
> could cause the guest to receive packets in unexpected queues, breaking
> RSS.
>
> # Proposal
>
> Our proposal diverge in step 7: Instead of enabling *all* the virtqueues,
only
> enable the CVQ.
Just to double check, VQ 0 and 1 of the net are also not enabled, correct?
> After that, send the DRIVER_OK and queue all the control
> commands to restore the device status (MQ, RSS, ...). Once all of them have
> been acknowledged ("device", or emulated cvq in host vdpa backend
driver,
> has used all cvq buffers, enable (SET_VRING_ENABLE, set_vq_ready) all
> other queues.
>
What is special about doing DRIVER_OK and enqueuing the control commands?
Why other configuration cannot be applied before DRIVER_OK?
In other words,
Step_7 already setups up the necessary VQ related fields.
Before doing driver ok, what is needed is to setup any other device fields and
features.
For net this includes rss, vlan, mac filters.
So, a new vdpa ioctl() should be able to set these values.
This is the ioctl() between user and kernel.
Post this ioctl(), DRIVER_OK should be done resuming the device.
Device has full view of config now.
This node local device setup change should not require migration protocol
change.
This scheme will also work for non_net virtio devices too.
> Everything needed for this is already implemented in the kernel as far as I
> see, there is only a small modification in qemu needed. Thus achieving the
> restoring of the device state without creating maintenance burden.
>
> A lot of optimizations can be applied on top without the need to add stuff
to
> the migration protocol or vDPA uAPI, like the pre-warming of the vdpa
> queues or adding more capabilities to the emulated CVQ.
Above ioctl() will enable vdpa subsystem to apply this setting one mor more
times in pre-warming up stage before DRIVER_OK.
>
> Other optimizations like applying the state out of band can also be added
so
> they can run in parallel with the migration, but that requires a bigger
change
> in qemu migration protocol making us lose focus on achieving at least the
> basic device migration in my opinion.
>
Let's strive to apply this in-band as much as possible. Applying out of band
opens issues unrelated to migration (authentication and more).