thr3ads.net - Linux Virtualization - [RFC PATCH 0/1] virtio: false unhandled irqs from vring

If this information is useful, please help other people find it:
Share via:

Stefan Hajnoczi

2021-Aug-24 10:59 UTC

[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

While investigating an unhandled irq from vring_interrupt() with virtiofs I
stumbled onto a possible race that also affects virtio_gpu. This theory is
based on code inspection and hopefully you can point out something that makes
this a non-issue in practice :).

The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
taken with no used (completed) buffers in the virtqueue. The kernel disables
the irq and the driver is no longer receives irqs when this happens:

  irq 77: nobody cared (try booting with the "irqpoll" option)
  ...
  handlers:
  [<00000000a40a49bb>] vring_interrupt
  Disabling IRQ #77

Consider the following:

1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
   function is scheduled to dequeue used buffers:
   vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()

2. The device adds more used requests and just before...

3. ...virtio_fs_requests_done_work() empties the virtqueue with
   virtqueue_get_buf().

4. The device raises the irq and vring_interrupt() is called after
   virtio_fs_requests_done_work emptied the virtqueue:

   irqreturn_t vring_interrupt(int irq, void *_vq)
   {
       struct vring_virtqueue *vq = to_vvq(_vq);

       if (!more_used(vq)) {
           pr_debug("virtqueue interrupt with no work for %p\n", vq);
           return IRQ_NONE;
           ^^^^^^^^^^^^^^^^

I have included a patch that switches virtiofs from spin_lock() to
spin_lock_irqsave() to prevent vring_interrupt() from running while the
virtqueue is processed from a work function.

virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
I think this can result in the same false unhandled irq problem as virtiofs.

This race condition could in theory affect all drivers. The VIRTIO
specification says:

  Neither of these notification suppression methods are reliable, as they are
  not synchronized with the device, but they serve as useful optimizations.

If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
then virtio_net and other drivers have a problem because because an irq could
be raised while the driver is dequeuing used buffers. I think we haven't
seen
this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
devices might cache the value and not disable notifications for some time...

Have I missed something?

The virtiofs patch I attached is being stress tested to see if the unhandled
irqs still occur.

Stefan Hajnoczi (1):
  fuse: disable local irqs when processing vq completions

 fs/fuse/virtio_fs.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

-- 
2.31.1

Stefan Hajnoczi

2021-Aug-24 10:59 UTC

head link

[RFC PATCH 1/1] fuse: disable local irqs when processing vq completions

The virtqueue completion handler function runs on a work queue and local
irqs are still enabled. There is a race where the completion handler
function grabs the next completed request just before vring_interrupt()
runs. vring_interrupt() sees an empty virtqueue and returns IRQ_NONE,
falsely declaring this interrupt unhandled.

The unhandled irq causes the kernel to disable the irq:

  irq 77: nobody cared (try booting with the "irqpoll" option)
  ...
  handlers:
  [<00000000d33eeed7>] vring_interrupt
  Disabling IRQ #77

The driver hangs afterwards since virtqueue irqs are now ignored.

Disable local irqs before calling virtqueue_get_buf() and re-enable them
afterwards so that vring_interrupt() doesn't run during the race window.

Reported-by: Xiaoling Gao <xiagao at redhat.com>
Cc: Michael Tsirkin <mst at redhat.com>
Cc: Jason Wang <jasowang at redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha at redhat.com>
---
I'm not 100% convinced this fixes everything because vring_interrupt()
can still run after our critical section and find the virtqueue empty.
virtqueue_disable_cb() should minimize that but it's only a hint and
there is a small window when the race condition can happen before it's
called.
---
 fs/fuse/virtio_fs.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8f52cdaa8445..57e1f264b0a8 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -319,9 +319,10 @@ static void virtio_fs_hiprio_done_work(struct work_struct
*work)
 	struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
 						 done_work);
 	struct virtqueue *vq = fsvq->vq;
+	unsigned long flags;
 
 	/* Free completed FUSE_FORGET requests */
-	spin_lock(&fsvq->lock);
+	spin_lock_irqsave(&fsvq->lock, flags);
 	do {
 		unsigned int len;
 		void *req;
@@ -333,7 +334,7 @@ static void virtio_fs_hiprio_done_work(struct work_struct
*work)
 			dec_in_flight_req(fsvq);
 		}
 	} while (!virtqueue_enable_cb(vq) &&
likely(!virtqueue_is_broken(vq)));
-	spin_unlock(&fsvq->lock);
+	spin_unlock_irqrestore(&fsvq->lock, flags);
 }
 
 static void virtio_fs_request_dispatch_work(struct work_struct *work)
@@ -601,11 +602,15 @@ static void virtio_fs_requests_done_work(struct
work_struct *work)
 	struct virtqueue *vq = fsvq->vq;
 	struct fuse_req *req;
 	struct fuse_req *next;
+	unsigned long flags;
 	unsigned int len;
 	LIST_HEAD(reqs);
 
-	/* Collect completed requests off the virtqueue */
-	spin_lock(&fsvq->lock);
+	/*
+	 * Collect completed requests off the virtqueue with irqs disabled to
+	 * prevent races with vring_interrupt().
+	 */
+	spin_lock_irqsave(&fsvq->lock, flags);
 	do {
 		virtqueue_disable_cb(vq);
 
@@ -615,7 +620,7 @@ static void virtio_fs_requests_done_work(struct work_struct
*work)
 			spin_unlock(&fpq->lock);
 		}
 	} while (!virtqueue_enable_cb(vq) &&
likely(!virtqueue_is_broken(vq)));
-	spin_unlock(&fsvq->lock);
+	spin_unlock_irqrestore(&fsvq->lock, flags);
 
 	/* End requests */
 	list_for_each_entry_safe(req, next, &reqs, list) {
-- 
2.31.1

Michael S. Tsirkin

2021-Aug-24 11:31 UTC

head link

[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi
wrote:> While investigating an unhandled irq from vring_interrupt() with virtiofs I
> stumbled onto a possible race that also affects virtio_gpu. This theory is
> based on code inspection and hopefully you can point out something that
makes
> this a non-issue in practice :).
> 
> The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
> taken with no used (completed) buffers in the virtqueue. The kernel
disables
> the irq and the driver is no longer receives irqs when this happens:
> 
>   irq 77: nobody cared (try booting with the "irqpoll" option)
>   ...
>   handlers:
>   [<00000000a40a49bb>] vring_interrupt
>   Disabling IRQ #77
> 
> Consider the following:
> 
> 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
>    function is scheduled to dequeue used buffers:
>    vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
> 
> 2. The device adds more used requests and just before...
> 
> 3. ...virtio_fs_requests_done_work() empties the virtqueue with
>    virtqueue_get_buf().
> 
> 4. The device raises the irq and vring_interrupt() is called after
>    virtio_fs_requests_done_work emptied the virtqueue:
> 
>    irqreturn_t vring_interrupt(int irq, void *_vq)
>    {
>        struct vring_virtqueue *vq = to_vvq(_vq);
> 
>        if (!more_used(vq)) {
>            pr_debug("virtqueue interrupt with no work for %p\n",
vq);
>            return IRQ_NONE;
>            ^^^^^^^^^^^^^^^^
> 
> I have included a patch that switches virtiofs from spin_lock() to
> spin_lock_irqsave() to prevent vring_interrupt() from running while the
> virtqueue is processed from a work function.
> 
> virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> I think this can result in the same false unhandled irq problem as
virtiofs.
> 
> This race condition could in theory affect all drivers. The VIRTIO
> specification says:
> 
>   Neither of these notification suppression methods are reliable, as they
are
>   not synchronized with the device, but they serve as useful optimizations.
> 
> If virtqueue_disable_cb() is just a hint and might not disable virtqueue
irqs
> then virtio_net and other drivers have a problem because because an irq
could
> be raised while the driver is dequeuing used buffers. I think we
haven't seen
> this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
> devices might cache the value and not disable notifications for some
time...
> 
> Have I missed something?
> 
> The virtiofs patch I attached is being stress tested to see if the
unhandled
> irqs still occur.
> 
> Stefan Hajnoczi (1):
>   fuse: disable local irqs when processing vq completions
> 
>  fs/fuse/virtio_fs.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
Fundamentally it is not a problem to have an unhandled IRQ
once in a while. It's only a problem if this happens time
after time.


Does the kernel you are testing include
commit 8d622d21d24803408b256d96463eac4574dcf067
Author: Michael S. Tsirkin <mst at redhat.com>
Date:   Tue Apr 13 01:19:16 2021 -0400

    virtio: fix up virtio_disable_cb

?

If not it's worth checking whether the latest kernel still
has the issue.

Note cherry picking that commit isn't trivial there are
a bunch of dependencies.

If you want to try on an old kernel, disable event index.
> -- 
> 2.31.1
>

Stefan Hajnoczi

2021-Sep-06 15:18 UTC

head link

[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin
wrote:> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:
> > While investigating an unhandled irq from vring_interrupt() with
virtiofs I
> > stumbled onto a possible race that also affects virtio_gpu. This
theory is
> > based on code inspection and hopefully you can point out something
that makes
> > this a non-issue in practice :).
> > 
> > The vring_interrupt() function returns IRQ_NONE when an MSI-X
interrupt is
> > taken with no used (completed) buffers in the virtqueue. The kernel
disables
> > the irq and the driver is no longer receives irqs when this happens:
> > 
> >   irq 77: nobody cared (try booting with the "irqpoll"
option)
> >   ...
> >   handlers:
> >   [<00000000a40a49bb>] vring_interrupt
> >   Disabling IRQ #77
> > 
> > Consider the following:
> > 
> > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work()
work
> >    function is scheduled to dequeue used buffers:
> >    vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
> > 
> > 2. The device adds more used requests and just before...
> > 
> > 3. ...virtio_fs_requests_done_work() empties the virtqueue with
> >    virtqueue_get_buf().
> > 
> > 4. The device raises the irq and vring_interrupt() is called after
> >    virtio_fs_requests_done_work emptied the virtqueue:
> > 
> >    irqreturn_t vring_interrupt(int irq, void *_vq)
> >    {
> >        struct vring_virtqueue *vq = to_vvq(_vq);
> > 
> >        if (!more_used(vq)) {
> >            pr_debug("virtqueue interrupt with no work for
%p\n", vq);
> >            return IRQ_NONE;
> >            ^^^^^^^^^^^^^^^^
> > 
> > I have included a patch that switches virtiofs from spin_lock() to
> > spin_lock_irqsave() to prevent vring_interrupt() from running while
the
> > virtqueue is processed from a work function.
> > 
> > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
> > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
> > I think this can result in the same false unhandled irq problem as
virtiofs.
> > 
> > This race condition could in theory affect all drivers. The VIRTIO
> > specification says:
> > 
> >   Neither of these notification suppression methods are reliable, as
they are
> >   not synchronized with the device, but they serve as useful
optimizations.
> > 
> > If virtqueue_disable_cb() is just a hint and might not disable
virtqueue irqs
> > then virtio_net and other drivers have a problem because because an
irq could
> > be raised while the driver is dequeuing used buffers. I think we
haven't seen
> > this because software VIRTIO devices honor virtqueue_disable_cb().
Hardware
> > devices might cache the value and not disable notifications for some
time...
> > 
> > Have I missed something?
> > 
> > The virtiofs patch I attached is being stress tested to see if the
unhandled
> > irqs still occur.
> > 
> > Stefan Hajnoczi (1):
> >   fuse: disable local irqs when processing vq completions
> > 
> >  fs/fuse/virtio_fs.c | 15 ++++++++++-----
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> Fundamentally it is not a problem to have an unhandled IRQ
> once in a while. It's only a problem if this happens time
> after time.
> 
> 
> Does the kernel you are testing include
> commit 8d622d21d24803408b256d96463eac4574dcf067
> Author: Michael S. Tsirkin <mst at redhat.com>
> Date:   Tue Apr 13 01:19:16 2021 -0400
> 
>     virtio: fix up virtio_disable_cb
> 
> ?
> 
> If not it's worth checking whether the latest kernel still
> has the issue.
A new kernel with your patch doesn't have this issue.

Please disregard the patch I posted, your patch seems to be enough.

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210906/036596ce/attachment.sig>

Linux Virtualization - Sep 2021 - [RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

[RFC PATCH 1/1] fuse: disable local irqs when processing vq completions

[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()

[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()