? 2022/5/17 04:29, Parav Pandit ??:>
>> From: Eugenio Perez Martin <eperezma at redhat.com>
>> Sent: Monday, May 16, 2022 4:51 AM
>>
>> On Fri, May 13, 2022 at 8:25 PM Parav Pandit <parav at
nvidia.com> wrote:
>>> Hi Gautam,
>>>
>>> Please fix your email client to have right response format.
>>> Otherwise, it will be confusing for the rest and us to follow the
>> conversation.
>>> More below.
>>>
>>>> From: Gautam Dawar <gdawar at xilinx.com>
>>>> Sent: Friday, May 13, 2022 1:48 PM
>>>>> Our proposal diverge in step 7: Instead of enabling *all*
the
>>>>> virtqueues, only enable the CVQ.
>>>> Just to double check, VQ 0 and 1 of the net are also not
enabled, correct?
>>>> [GD>>] Yes, that's my understanding as well.
>>>>
>> That's correct. We can say that for a queue to be enabled three
things must
>> happen:
>> * DRIVER_OK (Still not send)
>> * VHOST_VDPA_SET_VRING_ENABLE sent for that queue (Only sent for
>> CVQ)
>> * If queue is not in first data queue pair and not cvq: send
>> VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET with a queue pair that include it.
>>
> These if conditions, specially the last one is not good as it requires
device type knowledge, which in most cases not needed.
> Specially for the new code.
>
>>>>> After that, send the DRIVER_OK and queue all the control
commands
>>>>> to restore the device status (MQ, RSS, ...). Once all of
them have
>>>>> been acknowledged ("device", or emulated cvq in
host vdpa backend
>>>>> driver, has used all cvq buffers, enable (SET_VRING_ENABLE,
>>>>> set_vq_ready) all other queues.
>>>>>
>>>> What is special about doing DRIVER_OK and enqueuing the control
>>>> commands?
>>>> Why other configuration cannot be applied before DRIVER_OK?
>> There is nothing special beyond "they have a method to be set that
way, so
>> reusing it avoids having to maintain many ways to set the same things,
>> simplifying implementations".
>>
>> I'm not saying "it has been like that forever so we cannot
change it":
>> I'm very open to the change but priority-wise we should first
achieve a
>> working LM with packed, in_order, or even indirect.
>>
>>>> [GD>>] For the device to be live (and any queue to be
able to pass
>>>> traffic), DRIVER_OK is a must.
>>> This applies only to the vdpa device implemented over virtio
device.
>>> For such use case/implementation anyway a proper virtio spec
extension is
>> needed for it be efficient.
>>> And what that happens this scheme will still work.
>>>
>> Although it's a longer route, I'd very much prefer an in-band
virtio way to
>> perform it rather than a linux/vdpa specific. It's one of the
reasons I prefer
>> the CVQ behavior over a vdpa specific ioctl.
>>
> What is the in-band method to set last_avail_idx?
> In-band virtio method doesn't exist.
Right, but it's part of the vhost API which was there for more than 10
years. This should be supported by all the vDPA vendors.
>
>>> Other vdpa devices doesn?t have to live with this limitation at
this moment.
>>>
>>>> So, any configuration can be passed over the CVQ only after it
is
>>>> started (vring configuration + DRIVER_OK). For an emulated
queue, if
>>>> the order is reversed, I think the enqueued commands will
remain
>>>> buffered and device should be able to service them when it goes
live.
>>> I likely didn?t understand what you describe above.
>>>
>>> Vq avail index etc is programmed before doing DRIVER_OK anyway.
>>>
>>> Sequence is very straight forward at destination from user to
kernel.
>>> 1. Set config space fields (such as
virtio_net_config/virtio_blk_config).
>>> 2. Set other device attributes (max vqs, current num vqs) 3. Set
net
>>> specific config (RSS, vlan, mac and more filters) 4. Set VQ fields
>>> (enable, msix, addr, avail indx) 5. Set DRIVER_OK, device resumes
from
>>> where it left off
>>>
>>> Steps #1 to #4 can be done multiple times in pre-warm device up
case in
>> future.
>>
>> That requires creating a way to set all the parameters enumerated there
to
>> vdpa devices. Each time a new feature is added to virtio-net that
modifies
>> the guest-visible fronted device we would need to update it.
> Any guest visible feature exposed by the vdpa device on the source side, a
migration agent needs to verify/and make destination device capable to support
it anyway. Without it a device may be migrated, but it won't perform at same
level as source.
>
>> And all the
>> layers of the stack need to maintain more state.
> Mostly not. A complete virtio device state arrived from source vdpa device
can be given to destination vdpa device without anyone else looking in the
middle. If this format is known/well defined.
That's fine, and it seems the virtio spec is a better place for this,
then we won't duplicate efforts?
>
>> From the guest point of view, to enable all the queues with
>> VHOST_VDPA_SET_VRING_ENABLE and don't send DRIVER_OK is the same
>> as send DRIVER_OK and not to enable any data queue with
>> VHOST_VDPA_SET_VRING_ENABLE.
> Enabling SET_VRING_ENABLE after DRIVER_OK has two basic things broken.
It looks to me the spec:
1) For PCI it doesn't forbid the driver to set queue_enable to 1 after
DRIVER_OK.
2) For MMIO, it even allows the driver to disable a queue after DRIVER_OK
> 1. supplied RSS config and VQ config is not honored for several tens of
hundreds of milliseconds
> It will be purely dependent on how/when this ioctl are made.
> Due to this behavior packet supposed to arrive in X VQ, arrives in Y VQ.
I don't get why we end up with this situation.
1) enable cvq
2) set driver_ok
3) set RSS
4) enable TX/RX
vs
1) set RSS
2) enable cvq
3) enable TX/RX
4) set driver_ok
Is the latter faster?
>
> 2. Each VQ enablement one at a time, requires constant steering update for
the VQ
> While this information is something already known. Trying to reuse brings a
callback result in this in-efficiency.
> So better to start with more reusable APIs that fits the LM flow.
I agree, but the method proposed in the mail seems to be the only way
that can work with the all the major vDPA vendors.
E.g the new API requires the device has the ability to receive device
state other than the control virtqueue which might not be supported the
hardware. (The device might expects a trap and emulate model rather than
save and restore).
From qemu point of view, it might need to support both models.
If the device can't do save and restore:
1.1) enable cvq
1.2) set driver_ok
1.3) set device state (MQ, RSS) via control vq
1.4) enable TX/RX
If the device can do save and restore:
2.1) set device state (new API for setting MQ,RSS)
2.2) enable cvq
2.3) enable TX?RX
2.4) set driver_ok
We can start from 1 since it works for all device and then adding
support for 2?
Thanks