On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin
<eperezma at redhat.com> wrote:>
> On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasowang at redhat.com>
wrote:
> >
> >
> > ? 2022/2/21 ??4:15, Eugenio Perez Martin ??:
> > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang at
redhat.com> wrote:
> > >>
> > >> ? 2022/2/17 ??8:48, Eugenio Perez Martin ??:
> > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang at
redhat.com> wrote:
> > >>>> ? 2022/2/1 ??7:25, Eugenio Perez Martin ??:
> > >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang
<jasowang at redhat.com> wrote:
> > >>>>>> ? 2022/1/22 ??4:27, Eugenio P?rez ??:
> > >>>>>>> @@ -272,6 +590,28 @@ void
vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>>>>> void
vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>>> {
> > >>>>>>>
event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>>>>>> + g_autofree VirtQueueElement
*next_avail_elem = NULL;
> > >>>>>>> +
> > >>>>>>> + if (!svq->vq) {
> > >>>>>>> + return;
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + /* Send all pending used descriptors
to guest */
> > >>>>>>> + vhost_svq_flush(svq, false);
> > >>>>>> Do we need to wait for all the pending
descriptors to be completed here?
> > >>>>>>
> > >>>>> No, this function does not wait, it only
completes the forwarding of
> > >>>>> the *used* descriptors.
> > >>>>>
> > >>>>> The best example is the net rx queue in my
opinion. This call will
> > >>>>> check SVQ's vring used_idx and will forward
the last used descriptors
> > >>>>> if any, but all available descriptors will remain
as available for
> > >>>>> qemu's VQ code.
> > >>>>>
> > >>>>> To skip it would miss those last rx descriptors
in migration.
> > >>>>>
> > >>>>> Thanks!
> > >>>> So it's probably to not the best place to ask.
It's more about the
> > >>>> inflight descriptors so it should be TX instead of
RX.
> > >>>>
> > >>>> I can imagine the migration last phase, we should
stop the vhost-vDPA
> > >>>> before calling vhost_svq_stop(). Then we should be
fine regardless of
> > >>>> inflight descriptors.
> > >>>>
> > >>> I think I'm still missing something here.
> > >>>
> > >>> To be on the same page. Regarding tx this could cause
repeated tx
> > >>> frames (one at source and other at destination), but
never a missed
> > >>> buffer not transmitted. The "stop before" could
be interpreted as "SVQ
> > >>> is not forwarding available buffers anymore". Would
that work?
> > >>
> > >> Right, but this only work if
> > >>
> > >> 1) a flush to make sure TX DMA for inflight descriptors are
all completed
> > >>
> > >> 2) just mark all inflight descriptor used
> > >>
> > > It currently trusts on the reverse: Buffers not marked as used
(by the
> > > device) will be available in the destination, so expect
> > > retransmissions.
> >
> >
> > I may miss something but I think we do migrate last_avail_idx. So
there
> > won't be a re-transmission, since we depend on qemu virtqueue code
to
> > deal with vring base?
> >
>
> On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
> mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
> last_avail_idx == last_used_idx, and it's migrated after that if
I'm
> not wrong.
Ok, I miss these details in the review. I suggest mentioning this in
the change log and add a comment in vhost_vdpa_get_vring_base().
>
> vhost kernel migrates last_avail_idx, but it makes rx buffers
> available on-demand, unlike SVQ. So it does not need to unwind buffers
> or anything like that. Because of how SVQ works with the rx queue,
> this is not possible, since the destination will find no available
> buffers for rx. And for tx you already have described the scenario.
>
> In other words, we cannot see SVQ as a vhost device in that regard:
> SVQ looks for total drain (as "make all guest's buffers available
for
> the device ASAP") vs the vhost device which can live with a lot of
> available ones and it will use them on demand. Same problem as
> masking. So the difference in behavior is justified in my opinion, and
> it can be improved in the future with the vdpa in-flight descriptors.
>
> If we restore the state that way in a virtio-net device, it will see
> the available ones as expected, not as in-flight.
>
> Another possibility is to transform all of these into in-flight ones,
> but I feel it would create problems. Can we migrate all rx queues as
> in-flight, with 0 bytes written? Is it worth it?
To clarify, for inflight I meant from the device point of view, that
is [last_used_idx, last_avail_idx).
So for RX and SVQ, it should be as simple as stop forwarding buffers
since last_used_idx should be the same as last_avail_idx in this case.
(Though technically the rx buffer might be modified by the NIC).
> I didn't investigate
> that path too much, but I think the virtio-net emulated device does
> not support that at the moment. If I'm not wrong, we should copy
> something like the body of virtio_blk_load_device if we want to go
> that route.
>
> The current approach might be too net-centric, so let me know if this
> behavior is unexpected or we can do better otherwise.
It should be fine to start from a networking device. We can add more
in the future if it is needed.
Thanks
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Otherwise there could be buffers that is inflight forever.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +
> > >>>>>>> + for (unsigned i = 0; i <
svq->vring.num; ++i) {
> > >>>>>>> + g_autofree VirtQueueElement
*elem = NULL;
> > >>>>>>> + elem =
g_steal_pointer(&svq->ring_id_maps[i]);
> > >>>>>>> + if (elem) {
> > >>>>>>> +
virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>>>>>> + }
> > >>>>>>> + }
> > >>>>>>> +
> > >>>>>>> + next_avail_elem =
g_steal_pointer(&svq->next_guest_avail_elem);
> > >>>>>>> + if (next_avail_elem) {
> > >>>>>>> +
virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>>>>>> +
next_avail_elem->len);
> > >>>>>>> + }
> > >>>>>>> }
> >
>