Stefan Hajnoczi
2021-Aug-24 10:59 UTC
[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
While investigating an unhandled irq from vring_interrupt() with virtiofs I
stumbled onto a possible race that also affects virtio_gpu. This theory is
based on code inspection and hopefully you can point out something that makes
this a non-issue in practice :).
The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is
taken with no used (completed) buffers in the virtqueue. The kernel disables
the irq and the driver is no longer receives irqs when this happens:
irq 77: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<00000000a40a49bb>] vring_interrupt
Disabling IRQ #77
Consider the following:
1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work
function is scheduled to dequeue used buffers:
vring_interrupt() -> virtio_fs_vq_done() -> schedule_work()
2. The device adds more used requests and just before...
3. ...virtio_fs_requests_done_work() empties the virtqueue with
virtqueue_get_buf().
4. The device raises the irq and vring_interrupt() is called after
virtio_fs_requests_done_work emptied the virtqueue:
irqreturn_t vring_interrupt(int irq, void *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
if (!more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
^^^^^^^^^^^^^^^^
I have included a patch that switches virtiofs from spin_lock() to
spin_lock_irqsave() to prevent vring_interrupt() from running while the
virtqueue is processed from a work function.
virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and
virtio_gpu_dequeue_cursor_func() work functions only use spin_lock().
I think this can result in the same false unhandled irq problem as virtiofs.
This race condition could in theory affect all drivers. The VIRTIO
specification says:
Neither of these notification suppression methods are reliable, as they are
not synchronized with the device, but they serve as useful optimizations.
If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs
then virtio_net and other drivers have a problem because because an irq could
be raised while the driver is dequeuing used buffers. I think we haven't
seen
this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware
devices might cache the value and not disable notifications for some time...
Have I missed something?
The virtiofs patch I attached is being stress tested to see if the unhandled
irqs still occur.
Stefan Hajnoczi (1):
fuse: disable local irqs when processing vq completions
fs/fuse/virtio_fs.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--
2.31.1
Stefan Hajnoczi
2021-Aug-24 10:59 UTC
[RFC PATCH 1/1] fuse: disable local irqs when processing vq completions
The virtqueue completion handler function runs on a work queue and local
irqs are still enabled. There is a race where the completion handler
function grabs the next completed request just before vring_interrupt()
runs. vring_interrupt() sees an empty virtqueue and returns IRQ_NONE,
falsely declaring this interrupt unhandled.
The unhandled irq causes the kernel to disable the irq:
irq 77: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<00000000d33eeed7>] vring_interrupt
Disabling IRQ #77
The driver hangs afterwards since virtqueue irqs are now ignored.
Disable local irqs before calling virtqueue_get_buf() and re-enable them
afterwards so that vring_interrupt() doesn't run during the race window.
Reported-by: Xiaoling Gao <xiagao at redhat.com>
Cc: Michael Tsirkin <mst at redhat.com>
Cc: Jason Wang <jasowang at redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha at redhat.com>
---
I'm not 100% convinced this fixes everything because vring_interrupt()
can still run after our critical section and find the virtqueue empty.
virtqueue_disable_cb() should minimize that but it's only a hint and
there is a small window when the race condition can happen before it's
called.
---
fs/fuse/virtio_fs.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8f52cdaa8445..57e1f264b0a8 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -319,9 +319,10 @@ static void virtio_fs_hiprio_done_work(struct work_struct
*work)
struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq,
done_work);
struct virtqueue *vq = fsvq->vq;
+ unsigned long flags;
/* Free completed FUSE_FORGET requests */
- spin_lock(&fsvq->lock);
+ spin_lock_irqsave(&fsvq->lock, flags);
do {
unsigned int len;
void *req;
@@ -333,7 +334,7 @@ static void virtio_fs_hiprio_done_work(struct work_struct
*work)
dec_in_flight_req(fsvq);
}
} while (!virtqueue_enable_cb(vq) &&
likely(!virtqueue_is_broken(vq)));
- spin_unlock(&fsvq->lock);
+ spin_unlock_irqrestore(&fsvq->lock, flags);
}
static void virtio_fs_request_dispatch_work(struct work_struct *work)
@@ -601,11 +602,15 @@ static void virtio_fs_requests_done_work(struct
work_struct *work)
struct virtqueue *vq = fsvq->vq;
struct fuse_req *req;
struct fuse_req *next;
+ unsigned long flags;
unsigned int len;
LIST_HEAD(reqs);
- /* Collect completed requests off the virtqueue */
- spin_lock(&fsvq->lock);
+ /*
+ * Collect completed requests off the virtqueue with irqs disabled to
+ * prevent races with vring_interrupt().
+ */
+ spin_lock_irqsave(&fsvq->lock, flags);
do {
virtqueue_disable_cb(vq);
@@ -615,7 +620,7 @@ static void virtio_fs_requests_done_work(struct work_struct
*work)
spin_unlock(&fpq->lock);
}
} while (!virtqueue_enable_cb(vq) &&
likely(!virtqueue_is_broken(vq)));
- spin_unlock(&fsvq->lock);
+ spin_unlock_irqrestore(&fsvq->lock, flags);
/* End requests */
list_for_each_entry_safe(req, next, &reqs, list) {
--
2.31.1
Michael S. Tsirkin
2021-Aug-24 11:31 UTC
[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:> While investigating an unhandled irq from vring_interrupt() with virtiofs I > stumbled onto a possible race that also affects virtio_gpu. This theory is > based on code inspection and hopefully you can point out something that makes > this a non-issue in practice :). > > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is > taken with no used (completed) buffers in the virtqueue. The kernel disables > the irq and the driver is no longer receives irqs when this happens: > > irq 77: nobody cared (try booting with the "irqpoll" option) > ... > handlers: > [<00000000a40a49bb>] vring_interrupt > Disabling IRQ #77 > > Consider the following: > > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work > function is scheduled to dequeue used buffers: > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work() > > 2. The device adds more used requests and just before... > > 3. ...virtio_fs_requests_done_work() empties the virtqueue with > virtqueue_get_buf(). > > 4. The device raises the irq and vring_interrupt() is called after > virtio_fs_requests_done_work emptied the virtqueue: > > irqreturn_t vring_interrupt(int irq, void *_vq) > { > struct vring_virtqueue *vq = to_vvq(_vq); > > if (!more_used(vq)) { > pr_debug("virtqueue interrupt with no work for %p\n", vq); > return IRQ_NONE; > ^^^^^^^^^^^^^^^^ > > I have included a patch that switches virtiofs from spin_lock() to > spin_lock_irqsave() to prevent vring_interrupt() from running while the > virtqueue is processed from a work function. > > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock(). > I think this can result in the same false unhandled irq problem as virtiofs. > > This race condition could in theory affect all drivers. The VIRTIO > specification says: > > Neither of these notification suppression methods are reliable, as they are > not synchronized with the device, but they serve as useful optimizations. > > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs > then virtio_net and other drivers have a problem because because an irq could > be raised while the driver is dequeuing used buffers. I think we haven't seen > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware > devices might cache the value and not disable notifications for some time... > > Have I missed something? > > The virtiofs patch I attached is being stress tested to see if the unhandled > irqs still occur. > > Stefan Hajnoczi (1): > fuse: disable local irqs when processing vq completions > > fs/fuse/virtio_fs.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-)Fundamentally it is not a problem to have an unhandled IRQ once in a while. It's only a problem if this happens time after time. Does the kernel you are testing include commit 8d622d21d24803408b256d96463eac4574dcf067 Author: Michael S. Tsirkin <mst at redhat.com> Date: Tue Apr 13 01:19:16 2021 -0400 virtio: fix up virtio_disable_cb ? If not it's worth checking whether the latest kernel still has the issue. Note cherry picking that commit isn't trivial there are a bunch of dependencies. If you want to try on an old kernel, disable event index.> -- > 2.31.1 >
Stefan Hajnoczi
2021-Sep-06 15:18 UTC
[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote: > > While investigating an unhandled irq from vring_interrupt() with virtiofs I > > stumbled onto a possible race that also affects virtio_gpu. This theory is > > based on code inspection and hopefully you can point out something that makes > > this a non-issue in practice :). > > > > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is > > taken with no used (completed) buffers in the virtqueue. The kernel disables > > the irq and the driver is no longer receives irqs when this happens: > > > > irq 77: nobody cared (try booting with the "irqpoll" option) > > ... > > handlers: > > [<00000000a40a49bb>] vring_interrupt > > Disabling IRQ #77 > > > > Consider the following: > > > > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work > > function is scheduled to dequeue used buffers: > > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work() > > > > 2. The device adds more used requests and just before... > > > > 3. ...virtio_fs_requests_done_work() empties the virtqueue with > > virtqueue_get_buf(). > > > > 4. The device raises the irq and vring_interrupt() is called after > > virtio_fs_requests_done_work emptied the virtqueue: > > > > irqreturn_t vring_interrupt(int irq, void *_vq) > > { > > struct vring_virtqueue *vq = to_vvq(_vq); > > > > if (!more_used(vq)) { > > pr_debug("virtqueue interrupt with no work for %p\n", vq); > > return IRQ_NONE; > > ^^^^^^^^^^^^^^^^ > > > > I have included a patch that switches virtiofs from spin_lock() to > > spin_lock_irqsave() to prevent vring_interrupt() from running while the > > virtqueue is processed from a work function. > > > > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and > > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock(). > > I think this can result in the same false unhandled irq problem as virtiofs. > > > > This race condition could in theory affect all drivers. The VIRTIO > > specification says: > > > > Neither of these notification suppression methods are reliable, as they are > > not synchronized with the device, but they serve as useful optimizations. > > > > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs > > then virtio_net and other drivers have a problem because because an irq could > > be raised while the driver is dequeuing used buffers. I think we haven't seen > > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware > > devices might cache the value and not disable notifications for some time... > > > > Have I missed something? > > > > The virtiofs patch I attached is being stress tested to see if the unhandled > > irqs still occur. > > > > Stefan Hajnoczi (1): > > fuse: disable local irqs when processing vq completions > > > > fs/fuse/virtio_fs.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > Fundamentally it is not a problem to have an unhandled IRQ > once in a while. It's only a problem if this happens time > after time. > > > Does the kernel you are testing include > commit 8d622d21d24803408b256d96463eac4574dcf067 > Author: Michael S. Tsirkin <mst at redhat.com> > Date: Tue Apr 13 01:19:16 2021 -0400 > > virtio: fix up virtio_disable_cb > > ? > > If not it's worth checking whether the latest kernel still > has the issue.A new kernel with your patch doesn't have this issue. Please disregard the patch I posted, your patch seems to be enough. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210906/036596ce/attachment.sig>