Stefan Hajnoczi
2021-Aug-24 10:59 UTC
[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
While investigating an unhandled irq from vring_interrupt() with virtiofs I stumbled onto a possible race that also affects virtio_gpu. This theory is based on code inspection and hopefully you can point out something that makes this a non-issue in practice :). The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is taken with no used (completed) buffers in the virtqueue. The kernel disables the irq and the driver is no longer receives irqs when this happens: irq 77: nobody cared (try booting with the "irqpoll" option) ... handlers: [<00000000a40a49bb>] vring_interrupt Disabling IRQ #77 Consider the following: 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work function is scheduled to dequeue used buffers: vring_interrupt() -> virtio_fs_vq_done() -> schedule_work() 2. The device adds more used requests and just before... 3. ...virtio_fs_requests_done_work() empties the virtqueue with virtqueue_get_buf(). 4. The device raises the irq and vring_interrupt() is called after virtio_fs_requests_done_work emptied the virtqueue: irqreturn_t vring_interrupt(int irq, void *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); if (!more_used(vq)) { pr_debug("virtqueue interrupt with no work for %p\n", vq); return IRQ_NONE; ^^^^^^^^^^^^^^^^ I have included a patch that switches virtiofs from spin_lock() to spin_lock_irqsave() to prevent vring_interrupt() from running while the virtqueue is processed from a work function. virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and virtio_gpu_dequeue_cursor_func() work functions only use spin_lock(). I think this can result in the same false unhandled irq problem as virtiofs. This race condition could in theory affect all drivers. The VIRTIO specification says: Neither of these notification suppression methods are reliable, as they are not synchronized with the device, but they serve as useful optimizations. If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs then virtio_net and other drivers have a problem because because an irq could be raised while the driver is dequeuing used buffers. I think we haven't seen this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware devices might cache the value and not disable notifications for some time... Have I missed something? The virtiofs patch I attached is being stress tested to see if the unhandled irqs still occur. Stefan Hajnoczi (1): fuse: disable local irqs when processing vq completions fs/fuse/virtio_fs.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) -- 2.31.1
Stefan Hajnoczi
2021-Aug-24 10:59 UTC
[RFC PATCH 1/1] fuse: disable local irqs when processing vq completions
The virtqueue completion handler function runs on a work queue and local irqs are still enabled. There is a race where the completion handler function grabs the next completed request just before vring_interrupt() runs. vring_interrupt() sees an empty virtqueue and returns IRQ_NONE, falsely declaring this interrupt unhandled. The unhandled irq causes the kernel to disable the irq: irq 77: nobody cared (try booting with the "irqpoll" option) ... handlers: [<00000000d33eeed7>] vring_interrupt Disabling IRQ #77 The driver hangs afterwards since virtqueue irqs are now ignored. Disable local irqs before calling virtqueue_get_buf() and re-enable them afterwards so that vring_interrupt() doesn't run during the race window. Reported-by: Xiaoling Gao <xiagao at redhat.com> Cc: Michael Tsirkin <mst at redhat.com> Cc: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha at redhat.com> --- I'm not 100% convinced this fixes everything because vring_interrupt() can still run after our critical section and find the virtqueue empty. virtqueue_disable_cb() should minimize that but it's only a hint and there is a small window when the race condition can happen before it's called. --- fs/fuse/virtio_fs.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8f52cdaa8445..57e1f264b0a8 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -319,9 +319,10 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work) struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, done_work); struct virtqueue *vq = fsvq->vq; + unsigned long flags; /* Free completed FUSE_FORGET requests */ - spin_lock(&fsvq->lock); + spin_lock_irqsave(&fsvq->lock, flags); do { unsigned int len; void *req; @@ -333,7 +334,7 @@ static void virtio_fs_hiprio_done_work(struct work_struct *work) dec_in_flight_req(fsvq); } } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); - spin_unlock(&fsvq->lock); + spin_unlock_irqrestore(&fsvq->lock, flags); } static void virtio_fs_request_dispatch_work(struct work_struct *work) @@ -601,11 +602,15 @@ static void virtio_fs_requests_done_work(struct work_struct *work) struct virtqueue *vq = fsvq->vq; struct fuse_req *req; struct fuse_req *next; + unsigned long flags; unsigned int len; LIST_HEAD(reqs); - /* Collect completed requests off the virtqueue */ - spin_lock(&fsvq->lock); + /* + * Collect completed requests off the virtqueue with irqs disabled to + * prevent races with vring_interrupt(). + */ + spin_lock_irqsave(&fsvq->lock, flags); do { virtqueue_disable_cb(vq); @@ -615,7 +620,7 @@ static void virtio_fs_requests_done_work(struct work_struct *work) spin_unlock(&fpq->lock); } } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); - spin_unlock(&fsvq->lock); + spin_unlock_irqrestore(&fsvq->lock, flags); /* End requests */ list_for_each_entry_safe(req, next, &reqs, list) { -- 2.31.1
Michael S. Tsirkin
2021-Aug-24 11:31 UTC
[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote:> While investigating an unhandled irq from vring_interrupt() with virtiofs I > stumbled onto a possible race that also affects virtio_gpu. This theory is > based on code inspection and hopefully you can point out something that makes > this a non-issue in practice :). > > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is > taken with no used (completed) buffers in the virtqueue. The kernel disables > the irq and the driver is no longer receives irqs when this happens: > > irq 77: nobody cared (try booting with the "irqpoll" option) > ... > handlers: > [<00000000a40a49bb>] vring_interrupt > Disabling IRQ #77 > > Consider the following: > > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work > function is scheduled to dequeue used buffers: > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work() > > 2. The device adds more used requests and just before... > > 3. ...virtio_fs_requests_done_work() empties the virtqueue with > virtqueue_get_buf(). > > 4. The device raises the irq and vring_interrupt() is called after > virtio_fs_requests_done_work emptied the virtqueue: > > irqreturn_t vring_interrupt(int irq, void *_vq) > { > struct vring_virtqueue *vq = to_vvq(_vq); > > if (!more_used(vq)) { > pr_debug("virtqueue interrupt with no work for %p\n", vq); > return IRQ_NONE; > ^^^^^^^^^^^^^^^^ > > I have included a patch that switches virtiofs from spin_lock() to > spin_lock_irqsave() to prevent vring_interrupt() from running while the > virtqueue is processed from a work function. > > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock(). > I think this can result in the same false unhandled irq problem as virtiofs. > > This race condition could in theory affect all drivers. The VIRTIO > specification says: > > Neither of these notification suppression methods are reliable, as they are > not synchronized with the device, but they serve as useful optimizations. > > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs > then virtio_net and other drivers have a problem because because an irq could > be raised while the driver is dequeuing used buffers. I think we haven't seen > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware > devices might cache the value and not disable notifications for some time... > > Have I missed something? > > The virtiofs patch I attached is being stress tested to see if the unhandled > irqs still occur. > > Stefan Hajnoczi (1): > fuse: disable local irqs when processing vq completions > > fs/fuse/virtio_fs.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-)Fundamentally it is not a problem to have an unhandled IRQ once in a while. It's only a problem if this happens time after time. Does the kernel you are testing include commit 8d622d21d24803408b256d96463eac4574dcf067 Author: Michael S. Tsirkin <mst at redhat.com> Date: Tue Apr 13 01:19:16 2021 -0400 virtio: fix up virtio_disable_cb ? If not it's worth checking whether the latest kernel still has the issue. Note cherry picking that commit isn't trivial there are a bunch of dependencies. If you want to try on an old kernel, disable event index.> -- > 2.31.1 >
Stefan Hajnoczi
2021-Sep-06 15:18 UTC
[RFC PATCH 0/1] virtio: false unhandled irqs from vring_interrupt()
On Tue, Aug 24, 2021 at 07:31:29AM -0400, Michael S. Tsirkin wrote:> On Tue, Aug 24, 2021 at 11:59:43AM +0100, Stefan Hajnoczi wrote: > > While investigating an unhandled irq from vring_interrupt() with virtiofs I > > stumbled onto a possible race that also affects virtio_gpu. This theory is > > based on code inspection and hopefully you can point out something that makes > > this a non-issue in practice :). > > > > The vring_interrupt() function returns IRQ_NONE when an MSI-X interrupt is > > taken with no used (completed) buffers in the virtqueue. The kernel disables > > the irq and the driver is no longer receives irqs when this happens: > > > > irq 77: nobody cared (try booting with the "irqpoll" option) > > ... > > handlers: > > [<00000000a40a49bb>] vring_interrupt > > Disabling IRQ #77 > > > > Consider the following: > > > > 1. An virtiofs irq is handled and the virtio_fs_requests_done_work() work > > function is scheduled to dequeue used buffers: > > vring_interrupt() -> virtio_fs_vq_done() -> schedule_work() > > > > 2. The device adds more used requests and just before... > > > > 3. ...virtio_fs_requests_done_work() empties the virtqueue with > > virtqueue_get_buf(). > > > > 4. The device raises the irq and vring_interrupt() is called after > > virtio_fs_requests_done_work emptied the virtqueue: > > > > irqreturn_t vring_interrupt(int irq, void *_vq) > > { > > struct vring_virtqueue *vq = to_vvq(_vq); > > > > if (!more_used(vq)) { > > pr_debug("virtqueue interrupt with no work for %p\n", vq); > > return IRQ_NONE; > > ^^^^^^^^^^^^^^^^ > > > > I have included a patch that switches virtiofs from spin_lock() to > > spin_lock_irqsave() to prevent vring_interrupt() from running while the > > virtqueue is processed from a work function. > > > > virtio_gpu has a similar case where virtio_gpu_dequeue_ctrl_func() and > > virtio_gpu_dequeue_cursor_func() work functions only use spin_lock(). > > I think this can result in the same false unhandled irq problem as virtiofs. > > > > This race condition could in theory affect all drivers. The VIRTIO > > specification says: > > > > Neither of these notification suppression methods are reliable, as they are > > not synchronized with the device, but they serve as useful optimizations. > > > > If virtqueue_disable_cb() is just a hint and might not disable virtqueue irqs > > then virtio_net and other drivers have a problem because because an irq could > > be raised while the driver is dequeuing used buffers. I think we haven't seen > > this because software VIRTIO devices honor virtqueue_disable_cb(). Hardware > > devices might cache the value and not disable notifications for some time... > > > > Have I missed something? > > > > The virtiofs patch I attached is being stress tested to see if the unhandled > > irqs still occur. > > > > Stefan Hajnoczi (1): > > fuse: disable local irqs when processing vq completions > > > > fs/fuse/virtio_fs.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > Fundamentally it is not a problem to have an unhandled IRQ > once in a while. It's only a problem if this happens time > after time. > > > Does the kernel you are testing include > commit 8d622d21d24803408b256d96463eac4574dcf067 > Author: Michael S. Tsirkin <mst at redhat.com> > Date: Tue Apr 13 01:19:16 2021 -0400 > > virtio: fix up virtio_disable_cb > > ? > > If not it's worth checking whether the latest kernel still > has the issue.A new kernel with your patch doesn't have this issue. Please disregard the patch I posted, your patch seems to be enough. Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210906/036596ce/attachment.sig>