Stefano Garzarella
2022-Feb-22 09:05 UTC
[PATCH] vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
On Tue, Feb 22, 2022 at 01:06:12AM +0530, Anirudh Rayabharam wrote:>On Mon, Feb 21, 2022 at 07:26:28PM +0100, Stefano Garzarella wrote: >> On Mon, Feb 21, 2022 at 11:33:11PM +0530, Anirudh Rayabharam wrote: >> > On Mon, Feb 21, 2022 at 05:44:20PM +0100, Stefano Garzarella wrote: >> > > On Mon, Feb 21, 2022 at 09:44:39PM +0530, Anirudh Rayabharam wrote: >> > > > On Mon, Feb 21, 2022 at 02:59:30PM +0100, Stefano Garzarella wrote: >> > > > > On Mon, Feb 21, 2022 at 12:49 PM Stefano Garzarella <sgarzare at redhat.com> wrote: >> > > > > > >> > > > > > vhost_vsock_stop() calls vhost_dev_check_owner() to check the device >> > > > > > ownership. It expects current->mm to be valid. >> > > > > > >> > > > > > vhost_vsock_stop() is also called by vhost_vsock_dev_release() when >> > > > > > the user has not done close(), so when we are in do_exit(). In this >> > > > > > case current->mm is invalid and we're releasing the device, so we >> > > > > > should clean it anyway. >> > > > > > >> > > > > > Let's check the owner only when vhost_vsock_stop() is called >> > > > > > by an ioctl. >> > > > > > >> > > > > > Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") >> > > > > > Cc: stable at vger.kernel.org >> > > > > > Reported-by: syzbot+1e3ea63db39f2b4440e0 at syzkaller.appspotmail.com >> > > > > > Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> >> > > > > > --- >> > > > > > drivers/vhost/vsock.c | 14 ++++++++------ >> > > > > > 1 file changed, 8 insertions(+), 6 deletions(-) >> > > > > >> > > > > Reported-and-tested-by: syzbot+0abd373e2e50d704db87 at syzkaller.appspotmail.com >> > > > >> > > > I don't think this patch fixes "INFO: task hung in vhost_work_dev_flush" >> > > > even though syzbot says so. I am able to reproduce the issue locally >> > > > even with this patch applied. >> > > >> > > Are you using the sysbot reproducer or another test? >> > > In that case, can you share it? >> > >> > I am using the syzbot reproducer. >> > >> > > >> > > From the stack trace it seemed to me that the worker accesses a zone that >> > > has been cleaned (iotlb), so it is invalid and fails. >> > >> > Would the thread hang in that case? How? >> >> Looking at this log [1] it seems that the process is blocked on the >> wait_for_completion() in vhost_work_dev_flush(). >> >> Since we're not setting the backend to NULL to stop the worker, it's likely >> that the worker will keep running, preventing the flush work from >> completing. > >The log shows that the worker thread is stuck in iotlb_access_ok(). How >will setting the backend to NULL stop it? During my debugging I found >that the worker is stuck in this while loop:Okay, looking at your new patch, now I see. If we enter in this loop before setting the backend to NULL and we have start = 0 and end = (u64) -1 , we should be there forever. I'll remove that tag in v2, but the test might fail without this patch applied, because for now we don't stop workers correctly. Thanks, Stefano