Michael S. Tsirkin
2022-Mar-03 21:01 UTC
[PATCH 1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress
On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:> On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > All workers/users should be halted before any clean-up should take place. > > > > Suggested-by: Michael S. Tsirkin <mst at redhat.com> > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > --- > > drivers/vhost/vhost.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index bbaff6a5e21b8..d935d2506963f 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > int i; > > > > for (i = 0; i < dev->nvqs; ++i) { > > + /* Ideally all workers should be stopped prior to clean-up */ > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > + > > mutex_lock(&dev->vqs[i]->mutex); > > I know nothing about vhost, but this construction and patch looks > strange to me. > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > here suggests to me that workers can still run here. > > Thanks"Ideally" here is misleading, we need a bigger detailed comment along the lines of: /* * By design, no workers can run here. But if there's a bug and the * driver did not flush all work properly then they might, and we * encountered such bugs in the past. With no proper flush guest won't * work correctly but avoiding host memory corruption in this case * sounds like a good idea. */> > if (dev->vqs[i]->error_ctx) > > eventfd_ctx_put(dev->vqs[i]->error_ctx); > > -- > > 2.35.1.574.g5d30c73bfb-goog > >
Leon Romanovsky
2022-Mar-04 07:08 UTC
[PATCH 1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress
On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:> On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: > > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > > All workers/users should be halted before any clean-up should take place. > > > > > > Suggested-by: Michael S. Tsirkin <mst at redhat.com> > > > Signed-off-by: Lee Jones <lee.jones at linaro.org> > > > --- > > > drivers/vhost/vhost.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index bbaff6a5e21b8..d935d2506963f 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > int i; > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > + /* Ideally all workers should be stopped prior to clean-up */ > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > + > > > mutex_lock(&dev->vqs[i]->mutex); > > > > I know nothing about vhost, but this construction and patch looks > > strange to me. > > > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > > here suggests to me that workers can still run here. > > > > Thanks > > > "Ideally" here is misleading, we need a bigger detailed comment > along the lines of: > > /* > * By design, no workers can run here. But if there's a bug and the > * driver did not flush all work properly then they might, and we > * encountered such bugs in the past. With no proper flush guest won't > * work correctly but avoiding host memory corruption in this case > * sounds like a good idea. > */This description looks better, but the check is inherently racy. Why don't you add a comment and mutex_lock()? The WARN_ON here is more distraction than actual help. Thanks> > > > if (dev->vqs[i]->error_ctx) > > > eventfd_ctx_put(dev->vqs[i]->error_ctx); > > > -- > > > 2.35.1.574.g5d30c73bfb-goog > > > >
Stefano Garzarella
2022-Mar-04 07:50 UTC
[PATCH 1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress
On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:>On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: >> On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: >> > All workers/users should be halted before any clean-up should take place. >> > >> > Suggested-by: Michael S. Tsirkin <mst at redhat.com> >> > Signed-off-by: Lee Jones <lee.jones at linaro.org> >> > --- >> > drivers/vhost/vhost.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> > index bbaff6a5e21b8..d935d2506963f 100644 >> > --- a/drivers/vhost/vhost.c >> > +++ b/drivers/vhost/vhost.c >> > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) >> > int i; >> > >> > for (i = 0; i < dev->nvqs; ++i) { >> > + /* Ideally all workers should be stopped prior to clean-up */ >> > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); >> > + >> > mutex_lock(&dev->vqs[i]->mutex); >> >> I know nothing about vhost, but this construction and patch looks >> strange to me. >> >> If all workers were stopped, you won't need mutex_lock(). The mutex_lock >> here suggests to me that workers can still run here. >> >> Thanks > > >"Ideally" here is misleading, we need a bigger detailed comment >along the lines of: > >/* > * By design, no workers can run here. But if there's a bug and the > * driver did not flush all work properly then they might, and we > * encountered such bugs in the past. With no proper flush guest won't > * work correctly but avoiding host memory corruption in this case > * sounds like a good idea. > */Can we use vhost_vq_get_backend() to check this situation? IIUC all the vhost devices clear the backend to stop the workers. This is not racy (if we do after the mutex_lock) and should cover all cases. Thanks, Stefano