Willem de Bruijn
2021-Apr-13 21:44 UTC
[PATCH RFC v2 1/4] virtio: fix up virtio_disable_cb
On Tue, Apr 13, 2021 at 3:54 PM Michael S. Tsirkin <mst at redhat.com> wrote:> > On Tue, Apr 13, 2021 at 10:01:11AM -0400, Willem de Bruijn wrote: > > On Tue, Apr 13, 2021 at 1:47 AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > virtio_disable_cb is currently a nop for split ring with event index. > > > This is because it used to be always called from a callback when we know > > > device won't trigger more events until we update the index. However, > > > now that we run with interrupts enabled a lot we also poll without a > > > callback so that is different: disabling callbacks will help reduce the > > > number of spurious interrupts. > > > > The device may poll for transmit completions as a result of an interrupt > > from virtnet_poll_tx. > > > > As well as asynchronously to this transmit interrupt, from start_xmit or > > from virtnet_poll_cleantx as a result of a receive interrupt. > > > > As of napi-tx, transmit interrupts are left enabled to operate in standard > > napi mode. While previously they would be left disabled for most of the > > time, enabling only when the queue as low on descriptors. > > > > (in practice, for the at the time common case of split ring with event index, > > little changed, as that mode does not actually enable/disable the interrupt, > > but looks at the consumer index in the ring to decide whether to interrupt) > > > > Combined, this may cause the following: > > > > 1. device sends a packet and fires transmit interrupt > > 2. driver cleans interrupts using virtnet_poll_cleantx > > 3. driver handles transmit interrupt using vring_interrupt, > > detects that the vring is empty: !more_used(vq), > > and records a spurious interrupt. > > > > I don't quite follow how suppressing interrupt suppression, i.e., > > skipping disable_cb, helps avoid this. > > I'm probably missing something. Is this solving a subtly different > > problem from the one as I understand it? > > I was thinking of this one: > > 1. device is sending packets > 2. driver cleans them at the same time using virtnet_poll_cleantx > 3. device fires transmit interrupts > 4. driver handles transmit interrupts using vring_interrupt, > detects that the vring is empty: !more_used(vq), > and records spurious interrupts.I think that's the same scenario> > > but even yours is also fixed I think. > > The common point is that a single spurious interrupt is not a problem. > The problem only exists if there are tons of spurious interrupts with no > real ones. For this to trigger, we keep polling the ring and while we do > device keeps firing interrupts. So just disable interrupts while we > poll.But the main change in this patch is to turn some virtqueue_disable_cb calls into no-ops. I don't understand how that helps reduce spurious interrupts, as if anything, it keeps interrupts enabled for longer. Another patch in the series disable callbacks* before starting to clean the descriptors from the rx interrupt. That I do understand will suppress additional tx interrupts that might see no work to be done. I just don't entire follow this patch on its own. *(I use interrupt and callback as a synonym in this context, correct me if I'm glancing over something essential)
Michael S. Tsirkin
2021-Apr-13 22:11 UTC
[PATCH RFC v2 1/4] virtio: fix up virtio_disable_cb
On Tue, Apr 13, 2021 at 05:44:42PM -0400, Willem de Bruijn wrote:> On Tue, Apr 13, 2021 at 3:54 PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Tue, Apr 13, 2021 at 10:01:11AM -0400, Willem de Bruijn wrote: > > > On Tue, Apr 13, 2021 at 1:47 AM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > > > virtio_disable_cb is currently a nop for split ring with event index. > > > > This is because it used to be always called from a callback when we know > > > > device won't trigger more events until we update the index. However, > > > > now that we run with interrupts enabled a lot we also poll without a > > > > callback so that is different: disabling callbacks will help reduce the > > > > number of spurious interrupts. > > > > > > The device may poll for transmit completions as a result of an interrupt > > > from virtnet_poll_tx. > > > > > > As well as asynchronously to this transmit interrupt, from start_xmit or > > > from virtnet_poll_cleantx as a result of a receive interrupt. > > > > > > As of napi-tx, transmit interrupts are left enabled to operate in standard > > > napi mode. While previously they would be left disabled for most of the > > > time, enabling only when the queue as low on descriptors. > > > > > > (in practice, for the at the time common case of split ring with event index, > > > little changed, as that mode does not actually enable/disable the interrupt, > > > but looks at the consumer index in the ring to decide whether to interrupt) > > > > > > Combined, this may cause the following: > > > > > > 1. device sends a packet and fires transmit interrupt > > > 2. driver cleans interrupts using virtnet_poll_cleantx > > > 3. driver handles transmit interrupt using vring_interrupt, > > > detects that the vring is empty: !more_used(vq), > > > and records a spurious interrupt. > > > > > > I don't quite follow how suppressing interrupt suppression, i.e., > > > skipping disable_cb, helps avoid this. > > > I'm probably missing something. Is this solving a subtly different > > > problem from the one as I understand it? > > > > I was thinking of this one: > > > > 1. device is sending packets > > 2. driver cleans them at the same time using virtnet_poll_cleantx > > 3. device fires transmit interrupts > > 4. driver handles transmit interrupts using vring_interrupt, > > detects that the vring is empty: !more_used(vq), > > and records spurious interrupts. > > I think that's the same scenarioNot a big difference I agree.> > > > > > but even yours is also fixed I think. > > > > The common point is that a single spurious interrupt is not a problem. > > The problem only exists if there are tons of spurious interrupts with no > > real ones. For this to trigger, we keep polling the ring and while we do > > device keeps firing interrupts. So just disable interrupts while we > > poll. > > But the main change in this patch is to turn some virtqueue_disable_cb > calls into no-ops.Well this was not the design. This is the main change: @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq) if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; - if (!vq->event) + if (vq->event) + /* TODO: this is a hack. Figure out a cleaner value to write. */ + vring_used_event(&vq->split.vring) = 0x0; + else vq->split.vring.avail->flags cpu_to_virtio16(_vq->vdev, vq->split.avail_flags_shadow); IIUC previously when event index was enabled (vq->event) virtqueue_disable_cb_split was a nop. Now it sets index to 0x0 (which is a hack, but good enough for testing I think).> I don't understand how that helps reduce spurious > interrupts, as if anything, it keeps interrupts enabled for longer. > > Another patch in the series disable callbacks* before starting to > clean the descriptors from the rx interrupt. That I do understand will > suppress additional tx interrupts that might see no work to be done. I > just don't entire follow this patch on its own. > > *(I use interrupt and callback as a synonym in this context, correct > me if I'm glancing over something essential)It's the same for the pci transport. -- MST