Greg KH
2022-Mar-08 08:57 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote:> On Mon, 07 Mar 2022, Greg KH wrote: > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote: > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > capture possible future race conditions which may pop up over time. > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > Cc: <stable at vger.kernel.org> > > > Reported-by: syzbot+adc3cb32385586bec859 at syzkaller.appspotmail.com > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > --- > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > int i; > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > + /* No workers should run here by design. However, races have > > > + * previously occurred where drivers have been unable to flush > > > + * all work properly prior to clean-up. Without a successful > > > + * flush the guest will malfunction, but avoiding host memory > > > + * corruption in those cases does seem preferable. > > > + */ > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > So you are trading one syzbot triggered issue for another one in the > > future? :) > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as > > that will trigger the panic-on-warn boxes, as well as syzbot. Unless > > you want that to happen? > > No, Syzbot doesn't report warnings, only BUGs and memory corruption.Has it changed? Last I looked, it did trigger on WARN_* calls, which has resulted in a huge number of kernel fixes because of that.> > And what happens if the mutex is locked _RIGHT_ after you checked it? > > You still have a race... > > No, we miss a warning that one time. Memory is still protected.Then don't warn on something that doesn't matter. This line can be dropped as there's nothing anyone can do about it, right? thanks, greg k-h
Lee Jones
2022-Mar-08 09:15 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, 08 Mar 2022, Greg KH wrote:> On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote: > > On Mon, 07 Mar 2022, Greg KH wrote: > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote: > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > > capture possible future race conditions which may pop up over time. > > > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > > > Cc: <stable at vger.kernel.org> > > > > Reported-by: syzbot+adc3cb32385586bec859 at syzkaller.appspotmail.com > > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > > --- > > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > > --- a/drivers/vhost/vhost.c > > > > +++ b/drivers/vhost/vhost.c > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > int i; > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > + /* No workers should run here by design. However, races have > > > > + * previously occurred where drivers have been unable to flush > > > > + * all work properly prior to clean-up. Without a successful > > > > + * flush the guest will malfunction, but avoiding host memory > > > > + * corruption in those cases does seem preferable. > > > > + */ > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > > > So you are trading one syzbot triggered issue for another one in the > > > future? :) > > > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as > > > that will trigger the panic-on-warn boxes, as well as syzbot. Unless > > > you want that to happen? > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption. > > Has it changed? Last I looked, it did trigger on WARN_* calls, which > has resulted in a huge number of kernel fixes because of that.Everything is customisable in syzkaller, so maybe there are specific builds which panic_on_warn enabled, but none that I'm involved with do. Here follows a topical example. The report above in the Link: tag comes with a crashlog [0]. In there you can see the WARN() at the bottom of vhost_dev_cleanup() trigger many times due to a populated (non-flushed) worker list, before finally tripping the BUG() which triggers the report: [0] https://syzkaller.appspot.com/text?tag=CrashLog&x=16a61fce700000> > > And what happens if the mutex is locked _RIGHT_ after you checked it? > > > You still have a race... > > > > No, we miss a warning that one time. Memory is still protected. > > Then don't warn on something that doesn't matter. This line can be > dropped as there's nothing anyone can do about it, right?You'll have to take that point up with Michael. -- Lee Jones [???] Principal Technical Lead - Developer Services Linaro.org ? Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
Michael S. Tsirkin
2022-Mar-08 11:05 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, Mar 08, 2022 at 09:57:57AM +0100, Greg KH wrote:> > > And what happens if the mutex is locked _RIGHT_ after you checked it? > > > You still have a race... > > > > No, we miss a warning that one time. Memory is still protected. > > Then don't warn on something that doesn't matter. This line can be > dropped as there's nothing anyone can do about it, right?I mean, the reason I wanted the warning is because there's a kernel bug, and it will break userspace. warning is just telling us this. is the bug reacheable from userspace? if we knew that we won't need the lock ... -- MST
Leon Romanovsky
2022-Mar-09 18:52 UTC
[PATCH 1/1] vhost: Protect the virtqueue from being cleared whilst still in use
On Tue, Mar 08, 2022 at 09:57:57AM +0100, Greg KH wrote:> On Tue, Mar 08, 2022 at 08:10:06AM +0000, Lee Jones wrote: > > On Mon, 07 Mar 2022, Greg KH wrote: > > > > > On Mon, Mar 07, 2022 at 07:17:57PM +0000, Lee Jones wrote: > > > > vhost_vsock_handle_tx_kick() already holds the mutex during its call > > > > to vhost_get_vq_desc(). All we have to do here is take the same lock > > > > during virtqueue clean-up and we mitigate the reported issues. > > > > > > > > Also WARN() as a precautionary measure. The purpose of this is to > > > > capture possible future race conditions which may pop up over time. > > > > > > > > Link: https://syzkaller.appspot.com/bug?extid=279432d30d825e63ba00 > > > > > > > > Cc: <stable at vger.kernel.org> > > > > Reported-by: syzbot+adc3cb32385586bec859 at syzkaller.appspotmail.com > > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > > --- > > > > drivers/vhost/vhost.c | 10 ++++++++++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > index 59edb5a1ffe28..ef7e371e3e649 100644 > > > > --- a/drivers/vhost/vhost.c > > > > +++ b/drivers/vhost/vhost.c > > > > @@ -693,6 +693,15 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > int i; > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > + /* No workers should run here by design. However, races have > > > > + * previously occurred where drivers have been unable to flush > > > > + * all work properly prior to clean-up. Without a successful > > > > + * flush the guest will malfunction, but avoiding host memory > > > > + * corruption in those cases does seem preferable. > > > > + */ > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > > > So you are trading one syzbot triggered issue for another one in the > > > future? :) > > > > > > If this ever can happen, handle it, but don't log it with a WARN_ON() as > > > that will trigger the panic-on-warn boxes, as well as syzbot. Unless > > > you want that to happen? > > > > No, Syzbot doesn't report warnings, only BUGs and memory corruption. > > Has it changed? Last I looked, it did trigger on WARN_* calls, which > has resulted in a huge number of kernel fixes because of that. > > > > And what happens if the mutex is locked _RIGHT_ after you checked it? > > > You still have a race... > > > > No, we miss a warning that one time. Memory is still protected. > > Then don't warn on something that doesn't matter. This line can be > dropped as there's nothing anyone can do about it, right?Greg, at least two other reviewers said that this line shouldn't be at all. https://lore.kernel.org/all/CACGkMEsjmCNQPjxPjXL0WUfbMg8ARnumEp4yjUxqznMKR1nKSQ at mail.gmail.com/ https://lore.kernel.org/all/YiG61RqXFvq%2Ft0fB at unreal/ https://lore.kernel.org/all/YiETnIcfZCLb63oB at unreal/ Thanks> > thanks, > > greg k-h