On Fri, Mar 25, 2022 at 5:10 PM Michael S. Tsirkin <mst at redhat.com>
wrote:>
> On Fri, Mar 25, 2022 at 03:52:00PM +0800, Jason Wang wrote:
> > On Fri, Mar 25, 2022 at 2:31 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >
> > > Bcc:
> > > Subject: Re: [PATCH 3/3] virtio: harden vring IRQ
> > > Message-ID: <20220325021422-mutt-send-email-mst at
kernel.org>
> > > Reply-To:
> > > In-Reply-To: <f7046303-7d7d-e39f-3c71-3688126cc812 at
redhat.com>
> > >
> > > On Fri, Mar 25, 2022 at 11:04:08AM +0800, Jason Wang wrote:
> > > >
> > > > ? 2022/3/24 ??7:03, Michael S. Tsirkin ??:
> > > > > On Thu, Mar 24, 2022 at 04:40:04PM +0800, Jason Wang
wrote:
> > > > > > This is a rework on the previous IRQ hardening
that is done for
> > > > > > virtio-pci where several drawbacks were found and
were reverted:
> > > > > >
> > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly
to affinity managed IRQ
> > > > > > that is used by some device such as virtio-blk
> > > > > > 2) done only for PCI transport
> > > > > >
> > > > > > In this patch, we tries to borrow the idea from
the INTX IRQ hardening
> > > > > > in the reverted commit 080cd7c3ac87
("virtio-pci: harden INTX interrupts")
> > > > > > by introducing a global irq_soft_enabled variable
for each
> > > > > > virtio_device. Then we can to toggle it during
> > > > > > virtio_reset_device()/virtio_device_ready(). A
synchornize_rcu() is
> > > > > > used in virtio_reset_device() to synchronize with
the IRQ handlers. In
> > > > > > the future, we may provide config_ops for the
transport that doesn't
> > > > > > use IRQ. With this, vring_interrupt() can return
check and early if
> > > > > > irq_soft_enabled is false. This lead to
smp_load_acquire() to be used
> > > > > > but the cost should be acceptable.
> > > > > Maybe it should be but is it? Can't we use
synchronize_irq instead?
> > > >
> > > >
> > > > Even if we allow the transport driver to synchornize through
> > > > synchronize_irq() we still need a check in the
vring_interrupt().
> > > >
> > > > We do something like the following previously:
> > > >
> > > > if (!READ_ONCE(vp_dev->intx_soft_enabled))
> > > > return IRQ_NONE;
> > > >
> > > > But it looks like a bug since speculative read can be done
before the check
> > > > where the interrupt handler can't see the uncommitted
setup which is done by
> > > > the driver.
> > >
> > > I don't think so - if you sync after setting the value then
> > > you are guaranteed that any handler running afterwards
> > > will see the new value.
> >
> > The problem is not disabled but the enable.
>
> So a misbehaving device can lose interrupts? That's not a problem at
all
> imo.
It's the interrupt raised before setting irq_soft_enabled to true:
CPU 0 probe) driver specific setup (not commited)
CPU 1 IRQ handler) read the uninitialized variable
CPU 0 probe) set irq_soft_enabled to true
CPU 1 IRQ handler) read irq_soft_enable as true
CPU 1 IRQ handler) use the uninitialized variable
Thanks
>
> > We use smp_store_relase()
> > to make sure the driver commits the setup before enabling the irq. It
> > means the read needs to be ordered as well in vring_interrupt().
> >
> > >
> > > Although I couldn't find anything about this in
memory-barriers.txt
> > > which surprises me.
> > >
> > > CC Paul to help make sure I'm right.
> > >
> > >
> > > >
> > > > >
> > > > > > To avoid breaking legacy device which can send IRQ
before DRIVER_OK, a
> > > > > > module parameter is introduced to enable the
hardening so function
> > > > > > hardening is disabled by default.
> > > > > Which devices are these? How come they send an
interrupt before there
> > > > > are any buffers in any queues?
> > > >
> > > >
> > > > I copied this from the commit log for 22b7050a024d7
> > > >
> > > > "
> > > >
> > > > This change will also benefit old hypervisors (before
2009)
> > > > that send interrupts without checking DRIVER_OK:
previously,
> > > > the callback could race with driver-specific
initialization.
> > > > "
> > > >
> > > > If this is only for config interrupt, I can remove the above
log.
> > >
> > >
> > > This is only for config interrupt.
> >
> > Ok.
> >
> > >
> > > >
> > > > >
> > > > > > Note that the hardening is only done for vring
interrupt since the
> > > > > > config interrupt hardening is already done in
commit 22b7050a024d7
> > > > > > ("virtio: defer config changed
notifications"). But the method that is
> > > > > > used by config interrupt can't be reused by
the vring interrupt
> > > > > > handler because it uses spinlock to do the
synchronization which is
> > > > > > expensive.
> > > > > >
> > > > > > Signed-off-by: Jason Wang <jasowang at
redhat.com>
> > > > >
> > > > > > ---
> > > > > > drivers/virtio/virtio.c | 19
+++++++++++++++++++
> > > > > > drivers/virtio/virtio_ring.c | 9 ++++++++-
> > > > > > include/linux/virtio.h | 4 ++++
> > > > > > include/linux/virtio_config.h | 25
+++++++++++++++++++++++++
> > > > > > 4 files changed, 56 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio.c
b/drivers/virtio/virtio.c
> > > > > > index 8dde44ea044a..85e331efa9cc 100644
> > > > > > --- a/drivers/virtio/virtio.c
> > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > @@ -7,6 +7,12 @@
> > > > > > #include <linux/of.h>
> > > > > > #include <uapi/linux/virtio_ids.h>
> > > > > > +static bool irq_hardening = false;
> > > > > > +
> > > > > > +module_param(irq_hardening, bool, 0444);
> > > > > > +MODULE_PARM_DESC(irq_hardening,
> > > > > > + "Disalbe IRQ software processing
when it is not expected");
> > > > > > +
> > > > > > /* Unique numbering for virtio devices. */
> > > > > > static DEFINE_IDA(virtio_index_ida);
> > > > > > @@ -220,6 +226,15 @@ static int
virtio_features_ok(struct virtio_device *dev)
> > > > > > * */
> > > > > > void virtio_reset_device(struct virtio_device
*dev)
> > > > > > {
> > > > > > + /*
> > > > > > + * The below synchronize_rcu() guarantees that
any
> > > > > > + * interrupt for this line arriving after
> > > > > > + * synchronize_rcu() has completed is guaranteed
to see
> > > > > > + * irq_soft_enabled == false.
> > > > > News to me I did not know synchronize_rcu has anything
to do
> > > > > with interrupts. Did not you intend to use
synchronize_irq?
> > > > > I am not even 100% sure synchronize_rcu is by design a
memory barrier
> > > > > though it's most likely is ...
> > > >
> > > >
> > > > According to the comment above tree RCU version of
synchronize_rcu():
> > > >
> > > > """
> > > >
> > > > * RCU read-side critical sections are delimited by
rcu_read_lock()
> > > > * and rcu_read_unlock(), and may be nested. In addition,
but only in
> > > > * v5.0 and later, regions of code across which interrupts,
preemption,
> > > > * or softirqs have been disabled also serve as RCU
read-side critical
> > > > * sections. This includes hardware interrupt handlers,
softirq handlers,
> > > > * and NMI handlers.
> > > > """
> > > >
> > > > So interrupt handlers are treated as read-side critical
sections.
> > > >
> > > > And it has the comment for explain the barrier:
> > > >
> > > > """
> > > >
> > > > * Note that this guarantee implies further memory-ordering
guarantees.
> > > > * On systems with more than one CPU, when synchronize_rcu()
returns,
> > > > * each CPU is guaranteed to have executed a full memory
barrier since
> > > > * the end of its last RCU read-side critical section whose
beginning
> > > > * preceded the call to synchronize_rcu(). In addition,
each CPU having
> > > > """
> > > >
> > > > So on SMP it provides a full barrier. And for UP/tiny RCU we
don't need the
> > > > barrier, if the interrupt come after WRITE_ONCE() it will
see the
> > > > irq_soft_enabled as false.
> > > >
> > >
> > > You are right. So then
> > > 1. I do not think we need load_acquire - why is it needed? Just
> > > READ_ONCE should do.
> >
> > See above.
> >
> > > 2. isn't synchronize_irq also doing the same thing?
> >
> >
> > Yes, but it requires a config ops since the IRQ knowledge is transport
specific.
> >
> > >
> > >
> > > > >
> > > > > > + */
> > > > > > + WRITE_ONCE(dev->irq_soft_enabled, false);
> > > > > > + synchronize_rcu();
> > > > > > +
> > > > > > dev->config->reset(dev);
> > > > > > }
> > > > > > EXPORT_SYMBOL_GPL(virtio_reset_device);
> > > > > Please add comment explaining where it will be enabled.
> > > > > Also, we *really* don't need to synch if it was
already disabled,
> > > > > let's not add useless overhead to the boot
sequence.
> > > >
> > > >
> > > > Ok.
> > > >
> > > >
> > > > >
> > > > >
> > > > > > @@ -427,6 +442,10 @@ int
register_virtio_device(struct virtio_device *dev)
> > > > > >
spin_lock_init(&dev->config_lock);
> > > > > > dev->config_enabled = false;
> > > > > > dev->config_change_pending = false;
> > > > > > + dev->irq_soft_check = irq_hardening;
> > > > > > +
> > > > > > + if (dev->irq_soft_check)
> > > > > > + dev_info(&dev->dev, "IRQ
hardening is enabled\n");
> > > > > > /* We always start by resetting the
device, in case a previous
> > > > > > * driver messed it up. This also tests
that code path a little. */
> > > > > one of the points of hardening is it's also helpful
for buggy
> > > > > devices. this flag defeats the purpose.
> > > >
> > > >
> > > > Do you mean:
> > > >
> > > > 1) we need something like config_enable? This seems not easy
to be
> > > > implemented without obvious overhead, mainly the synchronize
with the
> > > > interrupt handlers
> > >
> > > But synchronize is only on tear-down path. That is not critical
for any
> > > users at the moment, even less than probe.
> >
> > I meant if we have vq->irq_pending, we need to call
vring_interrupt()
> > in the virtio_device_ready() and synchronize the IRQ handlers with
> > spinlock or others.
> >
> > >
> > > > 2) enable this by default, so I don't object, but this
may have some risk
> > > > for old hypervisors
> > >
> > >
> > > The risk if there's a driver adding buffers without setting
DRIVER_OK.
> >
> > Probably not, we have devices that accept random inputs from outside,
> > net, console, input etc. I've done a round of audits of the Qemu
> > codes. They look all fine since day0.
> >
> > > So with this approach, how about we rename the flag
"driver_ok"?
> > > And then add_buf can actually test it and BUG_ON if not there
(at least
> > > in the debug build).
> >
> > This looks like a hardening of the driver in the core instead of the
> > device. I think it can be done but in a separate series.
> >
> > >
> > > And going down from there, how about we cache status in the
> > > device? Then we don't need to keep re-reading it every time,
> > > speeding boot up a tiny bit.
> >
> > I don't fully understand here, actually spec requires status to be
> > read back for validation in many cases.
> >
> > Thanks
> >
> > >
> > > >
> > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c
b/drivers/virtio/virtio_ring.c
> > > > > > index 962f1477b1fa..0170f8c784d8 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -2144,10 +2144,17 @@ static inline bool
more_used(const struct vring_virtqueue *vq)
> > > > > > return vq->packed_ring ?
more_used_packed(vq) : more_used_split(vq);
> > > > > > }
> > > > > > -irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > > > +irqreturn_t vring_interrupt(int irq, void *v)
> > > > > > {
> > > > > > + struct virtqueue *_vq = v;
> > > > > > + struct virtio_device *vdev = _vq->vdev;
> > > > > > struct vring_virtqueue *vq =
to_vvq(_vq);
> > > > > > + if (!virtio_irq_soft_enabled(vdev)) {
> > > > > > + dev_warn_once(&vdev->dev,
"virtio vring IRQ raised before DRIVER_OK");
> > > > > > + return IRQ_NONE;
> > > > > > + }
> > > > > > +
> > > > > > if (!more_used(vq)) {
> > > > > > pr_debug("virtqueue
interrupt with no work for %p\n", vq);
> > > > > > return IRQ_NONE;
> > > > > > diff --git a/include/linux/virtio.h
b/include/linux/virtio.h
> > > > > > index 5464f398912a..957d6ad604ac 100644
> > > > > > --- a/include/linux/virtio.h
> > > > > > +++ b/include/linux/virtio.h
> > > > > > @@ -95,6 +95,8 @@ dma_addr_t
virtqueue_get_used_addr(struct virtqueue *vq);
> > > > > > * @failed: saved value for
VIRTIO_CONFIG_S_FAILED bit (for restore)
> > > > > > * @config_enabled: configuration change
reporting enabled
> > > > > > * @config_change_pending: configuration change
reported while disabled
> > > > > > + * @irq_soft_check: whether or not to check
@irq_soft_enabled
> > > > > > + * @irq_soft_enabled: callbacks enabled
> > > > > > * @config_lock: protects configuration change
reporting
> > > > > > * @dev: underlying device.
> > > > > > * @id: the device type identification (used to
match it with a driver).
> > > > > > @@ -109,6 +111,8 @@ struct virtio_device {
> > > > > > bool failed;
> > > > > > bool config_enabled;
> > > > > > bool config_change_pending;
> > > > > > + bool irq_soft_check;
> > > > > > + bool irq_soft_enabled;
> > > > > > spinlock_t config_lock;
> > > > > > spinlock_t vqs_list_lock; /* Protects
VQs list access */
> > > > > > struct device dev;
> > > > > > diff --git a/include/linux/virtio_config.h
b/include/linux/virtio_config.h
> > > > > > index dafdc7f48c01..9c1b61f2e525 100644
> > > > > > --- a/include/linux/virtio_config.h
> > > > > > +++ b/include/linux/virtio_config.h
> > > > > > @@ -174,6 +174,24 @@ static inline bool
virtio_has_feature(const struct virtio_device *vdev,
> > > > > > return __virtio_test_bit(vdev, fbit);
> > > > > > }
> > > > > > +/*
> > > > > > + * virtio_irq_soft_enabled: whether we can
execute callbacks
> > > > > > + * @vdev: the device
> > > > > > + */
> > > > > > +static inline bool virtio_irq_soft_enabled(const
struct virtio_device *vdev)
> > > > > > +{
> > > > > > + if (!vdev->irq_soft_check)
> > > > > > + return true;
> > > > > > +
> > > > > > + /*
> > > > > > + * Read irq_soft_enabled before reading other
device specific
> > > > > > + * data. Paried with smp_store_relase() in
> > > > > paired
> > > >
> > > >
> > > > Will fix.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > > + * virtio_device_ready() and
WRITE_ONCE()/synchronize_rcu() in
> > > > > > + * virtio_reset_device().
> > > > > > + */
> > > > > > + return
smp_load_acquire(&vdev->irq_soft_enabled);
> > > > > > +}
> > > > > > +
> > > > > > /**
> > > > > > * virtio_has_dma_quirk - determine whether this
device has the DMA quirk
> > > > > > * @vdev: the device
> > > > > > @@ -236,6 +254,13 @@ void
virtio_device_ready(struct virtio_device *dev)
> > > > > > if (dev->config->enable_cbs)
> > > > > >
dev->config->enable_cbs(dev);
> > > > > > + /*
> > > > > > + * Commit the driver setup before enabling the
virtqueue
> > > > > > + * callbacks. Paried with smp_load_acuqire() in
> > > > > > + * virtio_irq_soft_enabled()
> > > > > > + */
> > > > > > + smp_store_release(&dev->irq_soft_enabled,
true);
> > > > > > +
> > > > > > BUG_ON(status &
VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > dev->config->set_status(dev,
status | VIRTIO_CONFIG_S_DRIVER_OK);
> > > > > > }
> > > > > > --
> > > > > > 2.25.1
> > >
>