On Tue, Oct 12, 2021 at 02:43:13PM +0800, Jason Wang
wrote:> On Tue, Oct 12, 2021 at 2:35 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> >
> > On Tue, Oct 12, 2021 at 02:11:10PM +0800, Jason Wang wrote:
> > > On Tue, Oct 12, 2021 at 1:44 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > > >
> > > > On Tue, Oct 12, 2021 at 10:43:57AM +0800, Jason Wang wrote:
> > > > > On Mon, Oct 11, 2021 at 8:36 PM Michael S. Tsirkin
<mst at redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Oct 11, 2021 at 03:36:51PM +0800, Jason
Wang wrote:
> > > > > > > On Tue, Oct 5, 2021 at 3:42 PM Michael S.
Tsirkin <mst at redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 13, 2021 at 01:53:44PM
+0800, Jason Wang wrote:
> > > > > > > > > Hi All:
> > > > > > > > >
> > > > > > > > > This series treis to do more
hardening for virito.
> > > > > > > > >
> > > > > > > > > patch 1 validates the num_queues
for virio-blk device.
> > > > > > > > > patch 2-4 validates max_nr_ports
for virito-console device.
> > > > > > > > > patch 5-7 harden virtio-pci
interrupts to make sure no exepcted
> > > > > > > > > interrupt handler is tiggered. If
this makes sense we can do similar
> > > > > > > > > things in other transport drivers.
> > > > > > > > > patch 8-9 validate used ring
length.
> > > > > > > > >
> > > > > > > > > Smoking test on blk/net with
packed=on/off and iommu_platform=on/off.
> > > > > > > > >
> > > > > > > > > Please review.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > So I poked at console at least, and I
think I see
> > > > > > > > an issue: if interrupt handler queues a
work/bh,
> > > > > > > > then it can still run while reset is in
progress.
> > > > > > >
> > > > > > > Looks like a bug which is unrelated to the
hardening?
> > > > > >
> > > > > > Won't preventing use after free be relevant?
> > > > >
> > > > > Oh right.
> > > > >
> > > > > > I frankly don't know what does hardening means
then.
> > > > > > > E.g the driver
> > > > > > > should sync with work/bh before reset.
> > > > > >
> > > > > > No, there's no way to fix it ATM without extra
locks and state which I
> > > > > > think we should strive to avoid or make it
generic, not per-driver,
> > > > > > since sync before reset is useless, new interrupts
will just arrive and
> > > > > > queue more work. And a sync after reset is too
late since driver will
> > > > > > try to add buffers.
> > > > >
> > > > > Can we do something like
> > > > >
> > > > > 1) disable interrupt
> > > > > 2) sync bh
> > > > >
> > > > > Or I guess this is somehow you meant in the following
steps.
> > > >
> > > > So that would mean a new API to disable vq interrupts.
> > > > reset will re-enable.
> > > > E.g. virtqueue_cancel_cb_before_reset()?
> > > >
> > > > Then drivers can sync, then reset.
> > > > This means maintaining more state though, which I don't
like.
> > > >
> > > > An alternative is something like this:
> > > >
> > > > static void (*virtio_flush_device)(struct virtio_device
*dev);
> > > >
> > > > void virtio_reset_device(struct virtio_device *dev,
virtio_flush_device flush)
> > > > {
> > > > might_sleep();
> > > > if (flush) {
> > > > dev->config->disable_interrupts(dev);
> > > > flush(dev);
> > > > dev->config->reset(dev);
> > > > dev->config->enable_interrupts(dev);
> > >
> > > I wonder whether this is needed. As done in this series,
> > > enable_interrupt should be done in virtio_device_ready().
> > >
> > > Others should work.
> > >
> > > > } else {
> > > > dev->config->reset(dev);
> > > > }
> > > > }
> > > >
> > > > I have patches wrapping all reset calls in
virtio_reset_device
> > > > (without the flush parameter but that's easy to tweak).
> > >
> > > Does it work if I post V2 and you post those patches on top?
> >
> > The reset things? Sure.
>
> Ok.
>
> >
> > > >
> > > >
> > > > > >
> > > > > > Maybe we can break device. Two issues with that
> > > > > > - drivers might not be ready to handle add_buf
failures
> > > > > > - restore needs to unbreak then and we don't
have a way to do that yet
> > > > > >
> > > > > > So .. careful reading of all device drivers and
hoping we don't mess
> > > > > > things up even more ... here we come.
> > > > >
> > > > > Yes.
> > > >
> > > > The biggest issue with this trick is drivers not handling
add_buf
> > > > errors, adding a failure path here risks creating memory
leaks.
> > > > OTOH with e.g. bounce buffers maybe it's possible for
add buf to
> > > > fail anyway?
> > >
> > > I'm not sure I get this, a simple git grep told me at least
the return
> > > value of add_inbuf() were all checked.
> > >
> > > Thanks
> >
> > Checked locally, but not always error is handled all the way
> > to the top. E.g. add_port in console returns an error code
> > but that is never checked. Well, console is a mess generally.
>
> I see. I can try to audit all virtio drivers for the add_inbuf() case.
>
> Thanks
Why inbuf specifically?
I mean, re-reading code often finds bugs, sure ;)
But I don't think just to fix remove we need to audit them all
as such, as long as we are not modifying core, whatever
driver remove we are poking for, that driver needs to be
audited.
> >
> > > >
> > > > > >
> > > > > > > >
> > > > > > > > I sent a patch to fix it for console
removal specifically,
> > > > > > > > but I suspect it's not enough e.g.
freeze is still broken.
> > > > > > > > And note this has been reported without
any TDX things -
> > > > > > > > it's not a malicious device issue,
can be triggered just
> > > > > > > > by module unload.
> > > > > > > >
> > > > > > > > I am vaguely thinking about new APIs to
disable/enable callbacks.
> > > > > > > > An alternative:
> > > > > > > >
> > > > > > > > 1. adding new remove_nocb/freeze_nocb
calls
> > > > > > > > 2. disabling/enabling interrupts
automatically around these
> > > > > > > > 3. gradually moving devices to using
these
> > > > > > > > 4. once/if all device move, removing the
old callbacks
> > > > > > > >
> > > > > > > > the advantage here is that we'll be
sure calls are always
> > > > > > > > paired correctly.
> > > > > > >
> > > > > > > I'm not sure I get the idea, but my
feeling is that it doesn't
> > > > > > > conflict with the interrupt hardening here
(or at least the same
> > > > > > > method is required e.g NO_AUTO_EN).
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Right. It's not that it conflicts, it's
that I was hoping that
> > > > > > since you are working on hardening you can take up
fixing that.
> > > > > > Let me know whether you have the time. Thanks!
> > > > >
> > > > > I can do that.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Jason Wang (9):
> > > > > > > > > virtio-blk: validate num_queues
during probe
> > > > > > > > > virtio: add doc for validate()
method
> > > > > > > > > virtio-console: switch to use
.validate()
> > > > > > > > > virtio_console: validate
max_nr_ports before trying to use it
> > > > > > > > > virtio_config: introduce a new
ready method
> > > > > > > > > virtio_pci: harden MSI-X
interrupts
> > > > > > > > > virtio-pci: harden INTX
interrupts
> > > > > > > > > virtio_ring: fix typos in
vring_desc_extra
> > > > > > > > > virtio_ring: validate used buffer
length
> > > > > > > > >
> > > > > > > > > drivers/block/virtio_blk.c
| 3 +-
> > > > > > > > > drivers/char/virtio_console.c
| 51 +++++++++++++++++++++---------
> > > > > > > > > drivers/virtio/virtio_pci_common.c
| 43 +++++++++++++++++++++----
> > > > > > > > > drivers/virtio/virtio_pci_common.h
| 7 ++--
> > > > > > > > > drivers/virtio/virtio_pci_legacy.c
| 5 +--
> > > > > > > > > drivers/virtio/virtio_pci_modern.c
| 6 ++--
> > > > > > > > > drivers/virtio/virtio_ring.c
| 27 ++++++++++++++--
> > > > > > > > > include/linux/virtio.h
| 1 +
> > > > > > > > > include/linux/virtio_config.h
| 6 ++++
> > > > > > > > > 9 files changed, 118
insertions(+), 31 deletions(-)
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > >
> >