Greg KH
2022-Mar-08 09:57 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones wrote:> On Tue, 08 Mar 2022, Greg KH wrote: > > > On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote: > > > On Mon, 07 Mar 2022, Greg KH wrote: > > > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote: > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > > > capture possible future race conditions which may pop up over time. > > > > > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > > > > > Cc: <stable at vger.kernel.org> > > > > > Reported-by: syzbot+adc3cb32385586bec859 at syzkaller.appspotmail.com > > > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > > > --- > > > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > > > 1 file changed, 10 insertions(+) > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > > > --- a/drivers/vhost/vhost.c > > > > > +++ b/drivers/vhost/vhost.c > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > > int i; > > > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > > + /* No workers should run here by design. However, races have > > > > > + * previously occurred where drivers have been unable to flush > > > > > + * all work properly prior to clean-up. Without a successful > > > > > + * flush the guest will malfunction, but avoiding host memory > > > > > + * corruption in those cases does seem preferable. > > > > > + */ > > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > > > > > So you are trading one syzbot triggered issue for another one in the > > > > future? :) > > > > > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as > > > > that will trigger the panic-on-warn boxes, as well as syzbot. Unless > > > > you want that to happen? > > > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption. > > > > Has it changed? Last I looked, it did trigger on WARN_* calls, which > > has resulted in a huge number of kernel fixes because of that. > > Everything is customisable in syzkaller, so maybe there are specific > builds which panic_on_warn enabled, but none that I'm involved with > do.Many systems run with panic-on-warn (i.e. the cloud), as they want to drop a box and restart it if anything goes wrong. That's why syzbot reports on WARN_* calls. They should never be reachable by userspace actions.> Here follows a topical example. The report above in the Link: tag > comes with a crashlog [0]. In there you can see the WARN() at the > bottom of vhost_dev_cleanup() trigger many times due to a populated > (non-flushed) worker list, before finally tripping the BUG() which > triggers the report: > > [0] https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000Ok, so both happens here. But don't add a warning for something that can't happen. Just handle it and move on. It looks like you are handling it in this code, so please drop the WARN_ON(). thanks, greg k-h
Lee Jones
2022-Mar-08 10:08 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, 08 Mar 2022, Greg KH wrote:> On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones wrote: > > On Tue, 08 Mar 2022, Greg KH wrote: > > > > > On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote: > > > > On Mon, 07 Mar 2022, Greg KH wrote: > > > > > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote: > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > > > > capture possible future race conditions which may pop up over time. > > > > > > > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > > > > > > > Cc: <stable at vger.kernel.org> > > > > > > Reported-by: syzbot+adc3cb32385586bec859 at syzkaller.appspotmail.com > > > > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > > > > --- > > > > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > > > > 1 file changed, 10 insertions(+) > > > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > > > > --- a/drivers/vhost/vhost.c > > > > > > +++ b/drivers/vhost/vhost.c > > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > > > int i; > > > > > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > > > + /* No workers should run here by design. However, races have > > > > > > + * previously occurred where drivers have been unable to flush > > > > > > + * all work properly prior to clean-up. Without a successful > > > > > > + * flush the guest will malfunction, but avoiding host memory > > > > > > + * corruption in those cases does seem preferable. > > > > > > + */ > > > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > > > > > > > So you are trading one syzbot triggered issue for another one in the > > > > > future? :) > > > > > > > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as > > > > > that will trigger the panic-on-warn boxes, as well as syzbot. Unless > > > > > you want that to happen? > > > > > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption. > > > > > > Has it changed? Last I looked, it did trigger on WARN_* calls, which > > > has resulted in a huge number of kernel fixes because of that. > > > > Everything is customisable in syzkaller, so maybe there are specific > > builds which panic_on_warn enabled, but none that I'm involved with > > do. > > Many systems run with panic-on-warn (i.e. the cloud), as they want to > drop a box and restart it if anything goes wrong. > > That's why syzbot reports on WARN_* calls. They should never be > reachable by userspace actions. > > > Here follows a topical example. The report above in the Link: tag > > comes with a crashlog [0]. In there you can see the WARN() at the > > bottom of vhost_dev_cleanup() trigger many times due to a populated > > (non-flushed) worker list, before finally tripping the BUG() which > > triggers the report: > > > > [0] https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000 > > Ok, so both happens here. But don't add a warning for something that > can't happen. Just handle it and move on. It looks like you are > handling it in this code, so please drop the WARN_ON().Happy to oblige. Let's give Micheal a chance to speak, then I'll fix-up if he agrees. -- Lee Jones [???] Principal Technical Lead - Developer Services Linaro.org ? Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
Michael S. Tsirkin
2022-Mar-08 10:55 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:> On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones wrote: > > On Tue, 08 Mar 2022, Greg KH wrote: > > > > > On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote: > > > > On Mon, 07 Mar 2022, Greg KH wrote: > > > > > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote: > > > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > > > > capture possible future race conditions which may pop up over time. > > > > > > > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > > > > > > > Cc: <stable at vger.kernel.org> > > > > > > Reported-by: syzbot+adc3cb32385586bec859 at syzkaller.appspotmail.com > > > > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > > > > --- > > > > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > > > > 1 file changed, 10 insertions(+) > > > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > > > > --- a/drivers/vhost/vhost.c > > > > > > +++ b/drivers/vhost/vhost.c > > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > > > int i; > > > > > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > > > + /* No workers should run here by design. However, races have > > > > > > + * previously occurred where drivers have been unable to flush > > > > > > + * all work properly prior to clean-up. Without a successful > > > > > > + * flush the guest will malfunction, but avoiding host memory > > > > > > + * corruption in those cases does seem preferable. > > > > > > + */ > > > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > > > > > > > So you are trading one syzbot triggered issue for another one in the > > > > > future? :) > > > > > > > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as > > > > > that will trigger the panic-on-warn boxes, as well as syzbot. Unless > > > > > you want that to happen? > > > > > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption. > > > > > > Has it changed? Last I looked, it did trigger on WARN_* calls, which > > > has resulted in a huge number of kernel fixes because of that. > > > > Everything is customisable in syzkaller, so maybe there are specific > > builds which panic_on_warn enabled, but none that I'm involved with > > do. > > Many systems run with panic-on-warn (i.e. the cloud), as they want to > drop a box and restart it if anything goes wrong. > > That's why syzbot reports on WARN_* calls. They should never be > reachable by userspace actions. > > > Here follows a topical example. The report above in the Link: tag > > comes with a crashlog [0]. In there you can see the WARN() at the > > bottom of vhost_dev_cleanup() trigger many times due to a populated > > (non-flushed) worker list, before finally tripping the BUG() which > > triggers the report: > > > > [0] https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000 > > Ok, so both happens here. But don't add a warning for something that > can't happen. Just handle it and move on. It looks like you are > handling it in this code, so please drop the WARN_ON(). > > thanks, > > greg k-hHmm. Well this will mean if we ever reintroduce the bug then syzkaller will not catch it for us :( And the bug is there, it just results in a hard to reproduce error for userspace. Not sure what to do here. Export panic_on_warn flag to modules and check it here? -- MST