Michael S. Tsirkin
2021-Oct-13 12:34 UTC
[PATCH v2] vduse: Fix race condition between resetting and irq injecting
On Wed, Oct 13, 2021 at 08:30:40PM +0800, Yongji Xie wrote:> On Wed, Oct 13, 2021 at 7:10 PM Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Wed, Sep 29, 2021 at 04:30:50PM +0800, Xie Yongji wrote: > > > The interrupt might be triggered after a reset since there is > > > no synchronization between resetting and irq injecting. > > > > In fact, irq_lock is already used to synchronize with > > irqs. Why isn't taking and releasing it enough? > > > > For example: > > CPU 0 > CPU1 > --------- > -------- > vduse_dev_ioctl() > check DRIVER_OK > > vduse_dev_reset() > > flush_work(&vq->inject); > queue_work(vduse_irq_wq, &vq->inject); > > virtio_vdpa_probe() > > virtio_vdpa_find_vqs() > vduse_vq_irq_inject() > vq->cb.callback(vq->cb.private); > > set DRIVER_OK > > In the above case, the irq callback is still triggered before DRIVER_OK is set. > > But now I found it seems to be better to just check DRIVER_OK again in > vduse_vq_irq_inject().And then pesumably make sure each time we set status it's done under the irq lock?> > > And it > > > might break something if the interrupt is delayed until a new > > > round of device initialization. > > > > > > Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") > > > Signed-off-by: Xie Yongji <xieyongji at bytedance.com> > > > --- > > > drivers/vdpa/vdpa_user/vduse_dev.c | 37 +++++++++++++++++++++++++------------ > > > 1 file changed, 25 insertions(+), 12 deletions(-) > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > > > index cefb301b2ee4..841667a896dd 100644 > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > @@ -80,6 +80,7 @@ struct vduse_dev { > > > struct vdpa_callback config_cb; > > > struct work_struct inject; > > > spinlock_t irq_lock; > > > + struct rw_semaphore rwsem; > > > int minor; > > > bool broken; > > > bool connected; > > > > What does this lock protect? Use a more descriptive name pls, > > and maybe add a comment. > > > > This lock is used to ensure there is no more inflight irq kwork after reset. > > > > > > @@ -410,6 +411,8 @@ static void vduse_dev_reset(struct vduse_dev *dev) > > > if (domain->bounce_map) > > > vduse_domain_reset_bounce_map(domain); > > > > > > + down_write(&dev->rwsem); > > > + > > > dev->status = 0; > > > dev->driver_features = 0; > > > dev->generation++; > > > @@ -443,6 +446,8 @@ static void vduse_dev_reset(struct vduse_dev *dev) > > > flush_work(&vq->inject); > > > flush_work(&vq->kick); > > > } > > > + > > > + up_write(&dev->rwsem); > > > } > > > > > > static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx, > > > @@ -885,6 +890,23 @@ static void vduse_vq_irq_inject(struct work_struct *work) > > > spin_unlock_irq(&vq->irq_lock); > > > } > > > > > > +static int vduse_dev_queue_irq_work(struct vduse_dev *dev, > > > + struct work_struct *irq_work) > > > +{ > > > + int ret = -EINVAL; > > > + > > > + down_read(&dev->rwsem); > > > + if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK)) > > > + goto unlock; > > > + > > > + ret = 0; > > > + queue_work(vduse_irq_wq, irq_work); > > > +unlock: > > > + up_read(&dev->rwsem); > > > + > > > + return ret; > > > +} > > > + > > > static long vduse_dev_ioctl(struct file *file, unsigned int cmd, > > > unsigned long arg) > > > { > > > > > > so that's a lot of overhead for an irq. > > Normally the way to address races like this is to add > > flushing to the reset path, not locking to irq path. > > > > Yes, we already call flush_work() in the reset path. > > Thanks, > Yongji