On 2019/10/24 ??11:26, Liu, Yong wrote:> >> -----Original Message----- >> From: Jason Wang [mailto:jasowang at redhat.com] >> Sent: Tuesday, October 22, 2019 9:06 PM >> To: Liu, Yong <yong.liu at intel.com>; mst at redhat.com; Bie, Tiwei >> <tiwei.bie at intel.com> >> Cc: virtualization at lists.linux-foundation.org >> Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing >> >> >> On 2019/10/22 ??2:48, Liu, Yong wrote: >>> Hi Jason, >>> My answers are inline. >>> >>>> -----Original Message----- >>>> From: Jason Wang [mailto:jasowang at redhat.com] >>>> Sent: Tuesday, October 22, 2019 10:45 AM >>>> To: Liu, Yong <yong.liu at intel.com>; mst at redhat.com; Bie, Tiwei >>>> <tiwei.bie at intel.com> >>>> Cc: virtualization at lists.linux-foundation.org >>>> Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing >>>> >>>> >>>> On 2019/10/22 ??1:10, Marvin Liu wrote: >>>>> When callback is delayed, virtio expect that vhost will kick when >>>>> rolling over event offset. Recheck should be taken as used index may >>>>> exceed event offset between status check and driver event update. >>>>> >>>>> However, it is possible that flags was not modified if descriptors are >>>>> chained or in_order feature was negotiated. So flags at event offset >>>>> may not be valid for descriptor's status checking. Fix it by using last >>>>> used index as replacement. Tx queue will be stopped if there's not >>>>> enough freed buffers after recheck. >>>>> >>>>> Signed-off-by: Marvin Liu <yong.liu at intel.com> >>>>> >>>>> diff --git a/drivers/virtio/virtio_ring.c >> b/drivers/virtio/virtio_ring.c >>>>> index bdc08244a648..a8041e451e9e 100644 >>>>> --- a/drivers/virtio/virtio_ring.c >>>>> +++ b/drivers/virtio/virtio_ring.c >>>>> @@ -1499,9 +1499,6 @@ static bool >>>> virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) >>>>> * counter first before updating event flags. >>>>> */ >>>>> virtio_wmb(vq->weak_barriers); >>>>> - } else { >>>>> - used_idx = vq->last_used_idx; >>>>> - wrap_counter = vq->packed.used_wrap_counter; >>>>> } >>>>> >>>>> if (vq->packed.event_flags_shadow =>> VRING_PACKED_EVENT_FLAG_DISABLE) >>>> { >>>>> @@ -1518,7 +1515,9 @@ static bool >>>> virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) >>>>> */ >>>>> virtio_mb(vq->weak_barriers); >>>>> >>>>> - if (is_used_desc_packed(vq, used_idx, wrap_counter)) { >>>>> + if (is_used_desc_packed(vq, >>>>> + vq->last_used_idx, >>>>> + vq->packed.used_wrap_counter)) { >>>>> END_USE(vq); >>>>> return false; >>>>> } >>>> Hi Marvin: >>>> >>>> Two questions: >>>> >>>> 1) Do we support IN_ORDER in kernel driver? >>>> >>> Not support by now. But issue still can be possible if in_direct disabled >> and meanwhile descs are chained. >>> Due to packed ring desc status should check one by one, chose arbitrary >> position may cause issue. >> >> >> Right, then it's better to mention IN_ORDER as future features. >> >> >>>> 2) Should we check IN_ORDER in this case otherwise we may end up with >>>> interrupt storm when IN_ORDER is not negotiated? >>> Interrupt number will not increase here, event offset value calculated as >> before. >>> Here just recheck whether new used descs is enough for next around xmit. >>> If backend was slow, most likely Tx queue will sleep for a while until >> used index go over event offset. >> >> >> Ok, but what if the backend is almost as fast as guest driver? E.g in >> virtio-net we had: >> >> ??? if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { >> ??? ??? netif_stop_subqueue(dev, qnum); >> ??? ??? if (!use_napi && >> ??? ??? ??? unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { >> ??? ??? ??? /* More just got used, free them then recheck. */ >> ??? ??? ??? free_old_xmit_skbs(sq, false); >> ??? ??? ??? if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { >> ??? ??? ??? ??? netif_start_subqueue(dev, qnum); >> ??? ??? ??? ??? virtqueue_disable_cb(sq->vq); >> ??? ??? ??? } >> ??? ??? } >> ??? } >> >> I worry that we may end up with toggling queue state in the case >> (sq->vq->num_free is near 2 + MAX_SKB_FRAGS). >> > Yes, at this worst case each packet will add extra twice event flags write. Due to backend only read this value, the cost won't too much.For driver, it means extra overheads, atomics, less batching, stats updating etc. For backend, cacheline will bounce between two cpus.> Even we can track down chained descs and figure out whether event offset indexed desc is used. There's still possibility that flags is invalid. > One case is that backend can buffer multiple descs by not updating the first one. We cannot guarantee that later flags is usable until check from the first one.In this case, since we've stopped tx queue, so there's no new buffers added. It doesn't matter we get notified when the 3/4 or all of the descriptors has been used. Thanks> > Regards, > Marvin > >> It looks to me the correct thing to implement is to calculate the head >> descriptor of a chain that sits at 3/4. >> >> Thanks >> >> >>> Thanks, >>> Marvin >>> >>>> Thanks >>>>
Michael S. Tsirkin
2019-Oct-27 09:54 UTC
[PATCH] virtio_ring: fix packed ring event may missing
On Thu, Oct 24, 2019 at 11:50:51AM +0800, Jason Wang wrote:> > On 2019/10/24 ??11:26, Liu, Yong wrote: > > > > > -----Original Message----- > > > From: Jason Wang [mailto:jasowang at redhat.com] > > > Sent: Tuesday, October 22, 2019 9:06 PM > > > To: Liu, Yong <yong.liu at intel.com>; mst at redhat.com; Bie, Tiwei > > > <tiwei.bie at intel.com> > > > Cc: virtualization at lists.linux-foundation.org > > > Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing > > > > > > > > > On 2019/10/22 ??2:48, Liu, Yong wrote: > > > > Hi Jason, > > > > My answers are inline. > > > > > > > > > -----Original Message----- > > > > > From: Jason Wang [mailto:jasowang at redhat.com] > > > > > Sent: Tuesday, October 22, 2019 10:45 AM > > > > > To: Liu, Yong <yong.liu at intel.com>; mst at redhat.com; Bie, Tiwei > > > > > <tiwei.bie at intel.com> > > > > > Cc: virtualization at lists.linux-foundation.org > > > > > Subject: Re: [PATCH] virtio_ring: fix packed ring event may missing > > > > > > > > > > > > > > > On 2019/10/22 ??1:10, Marvin Liu wrote: > > > > > > When callback is delayed, virtio expect that vhost will kick when > > > > > > rolling over event offset. Recheck should be taken as used index may > > > > > > exceed event offset between status check and driver event update. > > > > > > > > > > > > However, it is possible that flags was not modified if descriptors are > > > > > > chained or in_order feature was negotiated. So flags at event offset > > > > > > may not be valid for descriptor's status checking. Fix it by using last > > > > > > used index as replacement. Tx queue will be stopped if there's not > > > > > > enough freed buffers after recheck. > > > > > > > > > > > > Signed-off-by: Marvin Liu <yong.liu at intel.com> > > > > > > > > > > > > diff --git a/drivers/virtio/virtio_ring.c > > > b/drivers/virtio/virtio_ring.c > > > > > > index bdc08244a648..a8041e451e9e 100644 > > > > > > --- a/drivers/virtio/virtio_ring.c > > > > > > +++ b/drivers/virtio/virtio_ring.c > > > > > > @@ -1499,9 +1499,6 @@ static bool > > > > > virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) > > > > > > * counter first before updating event flags. > > > > > > */ > > > > > > virtio_wmb(vq->weak_barriers); > > > > > > - } else { > > > > > > - used_idx = vq->last_used_idx; > > > > > > - wrap_counter = vq->packed.used_wrap_counter; > > > > > > } > > > > > > > > > > > > if (vq->packed.event_flags_shadow => > > VRING_PACKED_EVENT_FLAG_DISABLE) > > > > > { > > > > > > @@ -1518,7 +1515,9 @@ static bool > > > > > virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) > > > > > > */ > > > > > > virtio_mb(vq->weak_barriers); > > > > > > > > > > > > - if (is_used_desc_packed(vq, used_idx, wrap_counter)) { > > > > > > + if (is_used_desc_packed(vq, > > > > > > + vq->last_used_idx, > > > > > > + vq->packed.used_wrap_counter)) { > > > > > > END_USE(vq); > > > > > > return false; > > > > > > } > > > > > Hi Marvin: > > > > > > > > > > Two questions: > > > > > > > > > > 1) Do we support IN_ORDER in kernel driver? > > > > > > > > > Not support by now. But issue still can be possible if in_direct disabled > > > and meanwhile descs are chained. > > > > Due to packed ring desc status should check one by one, chose arbitrary > > > position may cause issue. > > > > > > > > > Right, then it's better to mention IN_ORDER as future features. > > > > > > > > > > > 2) Should we check IN_ORDER in this case otherwise we may end up with > > > > > interrupt storm when IN_ORDER is not negotiated? > > > > Interrupt number will not increase here, event offset value calculated as > > > before. > > > > Here just recheck whether new used descs is enough for next around xmit. > > > > If backend was slow, most likely Tx queue will sleep for a while until > > > used index go over event offset. > > > > > > > > > Ok, but what if the backend is almost as fast as guest driver? E.g in > > > virtio-net we had: > > > > > > ??? if (sq->vq->num_free < 2+MAX_SKB_FRAGS) { > > > ??? ??? netif_stop_subqueue(dev, qnum); > > > ??? ??? if (!use_napi && > > > ??? ??? ??? unlikely(!virtqueue_enable_cb_delayed(sq->vq))) { > > > ??? ??? ??? /* More just got used, free them then recheck. */ > > > ??? ??? ??? free_old_xmit_skbs(sq, false); > > > ??? ??? ??? if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) { > > > ??? ??? ??? ??? netif_start_subqueue(dev, qnum); > > > ??? ??? ??? ??? virtqueue_disable_cb(sq->vq); > > > ??? ??? ??? } > > > ??? ??? } > > > ??? } > > > > > > I worry that we may end up with toggling queue state in the case > > > (sq->vq->num_free is near 2 + MAX_SKB_FRAGS). > > > > > Yes, at this worst case each packet will add extra twice event flags write. Due to backend only read this value, the cost won't too much. > > > For driver, it means extra overheads, atomics, less batching, stats updating > etc. For backend, cacheline will bounce between two cpus. > > > > Even we can track down chained descs and figure out whether event offset indexed desc is used. There's still possibility that flags is invalid. > > One case is that backend can buffer multiple descs by not updating the first one. We cannot guarantee that later flags is usable until check from the first one. > > > In this case, since we've stopped tx queue, so there's no new buffers added. > It doesn't matter we get notified when the 3/4 or all of the descriptors has > been used. > > ThanksWell - checking the next descriptor will likely result in moving the event index forward, which will thinkably reduce the # of interrupts. So it's hard to predict which is better. I'll apply the patch for now as it's simple and safe. If someone has the time to work on tuning all this, that would be great.> > > > > Regards, > > Marvin > > > > > It looks to me the correct thing to implement is to calculate the head > > > descriptor of a chain that sits at 3/4. > > > > > > Thanks > > > > > > > > > > Thanks, > > > > Marvin > > > > > > > > > Thanks > > > > >