thr3ads.net - Virtualization - [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Greg KH

2022-Mar-08 09:57 UTC

[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones
wrote:> On Tue, 08 Mar 2022, Greg KH wrote:
> 
> > On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote:
> > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > 
> > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote:
> > > > > vhost_vsock_handle_tx_kick() already holds the mutex
during its call
> > > > > to vhost_get_vq_desc().  All we have to do here is take
the same lock
> > > > > during virtqueue clean-up and we mitigate the reported
issues.
> > > > > 
> > > > > Also WARN() as a precautionary measure.  The purpose of
this is to
> > > > > capture possible future race conditions which may pop
up over time.
> > > > > 
> > > > > Link:
https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > 
> > > > > Cc: <stable at vger.kernel.org>
> > > > > Reported-by: syzbot+adc3cb32385586bec859 at
syzkaller.appspotmail.com
> > > > > Signed-off-by: Lee Jones <lee.jones at
linaro.org>
> > > > > ---
> > > > >  drivers/vhost/vhost.c | 10 ++++++++++
> > > > >  1 file changed, 10 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vhost/vhost.c
b/drivers/vhost/vhost.c
> > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > --- a/drivers/vhost/vhost.c
> > > > > +++ b/drivers/vhost/vhost.c
> > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct
vhost_dev *dev)
> > > > >  	int i;
> > > > >  
> > > > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > > > +		/* No workers should run here by design. However,
races have
> > > > > +		 * previously occurred where drivers have been
unable to flush
> > > > > +		 * all work properly prior to clean-up.  Without a
successful
> > > > > +		 * flush the guest will malfunction, but avoiding
host memory
> > > > > +		 * corruption in those cases does seem preferable.
> > > > > +		 */
> > > > > +	
WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > > 
> > > > So you are trading one syzbot triggered issue for another
one in the
> > > > future?  :)
> > > > 
> > > > If this ever can happen, handle it, but don't log it
with a WARN_ON() as
> > > > that will trigger the panic-on-warn boxes, as well as
syzbot.  Unless
> > > > you want that to happen?
> > > 
> > > No, Syzbot doesn't report warnings, only BUGs and memory
corruption.
> > 
> > Has it changed?  Last I looked, it did trigger on WARN_* calls, which
> > has resulted in a huge number of kernel fixes because of that.
> 
> Everything is customisable in syzkaller, so maybe there are specific
> builds which panic_on_warn enabled, but none that I'm involved with
> do.
Many systems run with panic-on-warn (i.e. the cloud), as they want to
drop a box and restart it if anything goes wrong.

That's why syzbot reports on WARN_* calls.  They should never be
reachable by userspace actions.
> Here follows a topical example.  The report above in the Link: tag
> comes with a crashlog [0].  In there you can see the WARN() at the
> bottom of vhost_dev_cleanup() trigger many times due to a populated
> (non-flushed) worker list, before finally tripping the BUG() which
> triggers the report:
> 
> [0] https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000
Ok, so both happens here.  But don't add a warning for something that
can't happen.  Just handle it and move on.  It looks like you are
handling it in this code, so please drop the WARN_ON().

thanks,

greg k-h

Lee Jones

2022-Mar-08 10:08 UTC

head link

[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

On Tue, 08 Mar 2022, Greg KH wrote:
> On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones wrote:
> > On Tue, 08 Mar 2022, Greg KH wrote:
> > 
> > > On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote:
> > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > 
> > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones
wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the
mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do here is
take the same lock
> > > > > > during virtqueue clean-up and we mitigate the
reported issues.
> > > > > > 
> > > > > > Also WARN() as a precautionary measure.  The
purpose of this is to
> > > > > > capture possible future race conditions which may
pop up over time.
> > > > > > 
> > > > > > Link:
https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > 
> > > > > > Cc: <stable at vger.kernel.org>
> > > > > > Reported-by: syzbot+adc3cb32385586bec859 at
syzkaller.appspotmail.com
> > > > > > Signed-off-by: Lee Jones <lee.jones at
linaro.org>
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 10 ++++++++++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vhost.c
b/drivers/vhost/vhost.c
> > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct
vhost_dev *dev)
> > > > > >  	int i;
> > > > > >  
> > > > > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > > > > +		/* No workers should run here by design.
However, races have
> > > > > > +		 * previously occurred where drivers have been
unable to flush
> > > > > > +		 * all work properly prior to clean-up. 
Without a successful
> > > > > > +		 * flush the guest will malfunction, but
avoiding host memory
> > > > > > +		 * corruption in those cases does seem
preferable.
> > > > > > +		 */
> > > > > > +	
WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > > > 
> > > > > So you are trading one syzbot triggered issue for
another one in the
> > > > > future?  :)
> > > > > 
> > > > > If this ever can happen, handle it, but don't log
it with a WARN_ON() as
> > > > > that will trigger the panic-on-warn boxes, as well as
syzbot.  Unless
> > > > > you want that to happen?
> > > > 
> > > > No, Syzbot doesn't report warnings, only BUGs and memory
corruption.
> > > 
> > > Has it changed?  Last I looked, it did trigger on WARN_* calls,
which
> > > has resulted in a huge number of kernel fixes because of that.
> > 
> > Everything is customisable in syzkaller, so maybe there are specific
> > builds which panic_on_warn enabled, but none that I'm involved
with
> > do.
> 
> Many systems run with panic-on-warn (i.e. the cloud), as they want to
> drop a box and restart it if anything goes wrong.
> 
> That's why syzbot reports on WARN_* calls.  They should never be
> reachable by userspace actions.
> 
> > Here follows a topical example.  The report above in the Link: tag
> > comes with a crashlog [0].  In there you can see the WARN() at the
> > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > (non-flushed) worker list, before finally tripping the BUG() which
> > triggers the report:
> > 
> > [0]
https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000
> 
> Ok, so both happens here.  But don't add a warning for something that
> can't happen.  Just handle it and move on.  It looks like you are
> handling it in this code, so please drop the WARN_ON().
Happy to oblige.

Let's give Micheal a chance to speak, then I'll fix-up if he agrees.

-- 
Lee Jones [???]
Principal Technical Lead - Developer Services
Linaro.org ? Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

Michael S. Tsirkin

2022-Mar-08 10:55 UTC

head link

[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

On Tue, Mar 08, 2022 at 10:57:42AM +0100, Greg KH wrote:> On Tue, Mar 08, 2022 at 09:15:27AM +0000, Lee Jones wrote:
> > On Tue, 08 Mar 2022, Greg KH wrote:
> > 
> > > On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote:
> > > > On Mon, 07 Mar 2022, Greg KH wrote:
> > > > 
> > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones
wrote:
> > > > > > vhost_vsock_handle_tx_kick() already holds the
mutex during its call
> > > > > > to vhost_get_vq_desc().  All we have to do here is
take the same lock
> > > > > > during virtqueue clean-up and we mitigate the
reported issues.
> > > > > > 
> > > > > > Also WARN() as a precautionary measure.  The
purpose of this is to
> > > > > > capture possible future race conditions which may
pop up over time.
> > > > > > 
> > > > > > Link:
https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00
> > > > > > 
> > > > > > Cc: <stable at vger.kernel.org>
> > > > > > Reported-by: syzbot+adc3cb32385586bec859 at
syzkaller.appspotmail.com
> > > > > > Signed-off-by: Lee Jones <lee.jones at
linaro.org>
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 10 ++++++++++
> > > > > >  1 file changed, 10 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vhost.c
b/drivers/vhost/vhost.c
> > > > > > index 59edb5a1ffe28..ef7e371e3e649 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct
vhost_dev *dev)
> > > > > >  	int i;
> > > > > >  
> > > > > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > > > > +		/* No workers should run here by design.
However, races have
> > > > > > +		 * previously occurred where drivers have been
unable to flush
> > > > > > +		 * all work properly prior to clean-up. 
Without a successful
> > > > > > +		 * flush the guest will malfunction, but
avoiding host memory
> > > > > > +		 * corruption in those cases does seem
preferable.
> > > > > > +		 */
> > > > > > +	
WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > > > 
> > > > > So you are trading one syzbot triggered issue for
another one in the
> > > > > future?  :)
> > > > > 
> > > > > If this ever can happen, handle it, but don't log
it with a WARN_ON() as
> > > > > that will trigger the panic-on-warn boxes, as well as
syzbot.  Unless
> > > > > you want that to happen?
> > > > 
> > > > No, Syzbot doesn't report warnings, only BUGs and memory
corruption.
> > > 
> > > Has it changed?  Last I looked, it did trigger on WARN_* calls,
which
> > > has resulted in a huge number of kernel fixes because of that.
> > 
> > Everything is customisable in syzkaller, so maybe there are specific
> > builds which panic_on_warn enabled, but none that I'm involved
with
> > do.
> 
> Many systems run with panic-on-warn (i.e. the cloud), as they want to
> drop a box and restart it if anything goes wrong.
> 
> That's why syzbot reports on WARN_* calls.  They should never be
> reachable by userspace actions.
> 
> > Here follows a topical example.  The report above in the Link: tag
> > comes with a crashlog [0].  In there you can see the WARN() at the
> > bottom of vhost_dev_cleanup() trigger many times due to a populated
> > (non-flushed) worker list, before finally tripping the BUG() which
> > triggers the report:
> > 
> > [0]
https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000
> 
> Ok, so both happens here.  But don't add a warning for something that
> can't happen.  Just handle it and move on.  It looks like you are
> handling it in this code, so please drop the WARN_ON().
> 
> thanks,
> 
> greg k-h
Hmm. Well this will mean if we ever reintroduce the bug then
syzkaller will not catch it for us :( And the bug is there,
it just results in a hard to reproduce error for userspace.

Not sure what to do here. Export panic_on_warn flag to modules
and check it here?


-- 
MST

Virtualization - Mar 2022 - [PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use

[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use