thr3ads.net - Virtualization - [PATCH V3] virtio: disable notification hardening by default [Jun 2022]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2022-Jun-29 08:52 UTC

[PATCH V3] virtio: disable notification hardening by default

On Wed, Jun 29, 2022 at 04:34:36PM +0800, Jason Wang
wrote:> On Wed, Jun 29, 2022 at 3:15 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> >
> > On Wed, Jun 29, 2022 at 03:02:21PM +0800, Jason Wang wrote:
> > > On Wed, Jun 29, 2022 at 2:31 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > > >
> > > > On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason Wang wrote:
> > > > > On Tue, Jun 28, 2022 at 2:17 PM Jason Wang <jasowang
at redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 28, 2022 at 1:00 PM Michael S. Tsirkin
<mst at redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 28, 2022 at 11:49:12AM +0800,
Jason Wang wrote:
> > > > > > > > > Heh. Yea sure. But things work fine
for people. What is the chance
> > > > > > > > > your review found and fixed all
driver bugs?
> > > > > > > >
> > > > > > > > I don't/can't audit all bugs but
the race between open/close against
> > > > > > > > ready/reset. It looks to me a good
chance to fix them all but if you
> > > > > > > > think differently, let me know
> > > > > > > >
> > > > > > > > > After two attempts
> > > > > > > > > I don't feel like hoping audit
will fix all bugs.
> > > > > > > >
> > > > > > > > I've started the auditing and have
15+ patches in the queue. (only
> > > > > > > > covers bluetooth, console, pmem,
virtio-net and caif). Spotting the
> > > > > > > > issue is not hard but the testing, It
would take at least the time of
> > > > > > > > one release to finalize I guess.
> > > > > > >
> > > > > > > Absolutely. So I am looking for a way to
implement hardening that does
> > > > > > > not break existing drivers.
> > > > > >
> > > > > > I totally agree with you to seek a way without
bothering the drivers.
> > > > > > Just wonder if this is possbile.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The reason config was
kind of easy is that config interrupt is rarely
> > > > > > > > > > > vital for device function
so arbitrarily deferring that does not lead to
> > > > > > > > > > > deadlocks - what you are
trying to do with VQ interrupts is
> > > > > > > > > > > fundamentally different.
Things are especially bad if we just drop
> > > > > > > > > > > an interrupt but
deferring can lead to problems too.
> > > > > > > > > >
> > > > > > > > > > I'm not sure I see the
difference, disable_irq() stuffs also delay the
> > > > > > > > > > interrupt processing until
enable_irq().
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Absolutely. I am not at all sure
disable_irq fixes all problems.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Consider as an example
> > > > > > > > > > >     virtio-net: fix race
between ndo_open() and virtio_device_ready()
> > > > > > > > > > > if you just defer vq
interrupts you get deadlocks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I don't see a deadlock
here, maybe you can show more detail on this?
> > > > > > > > >
> > > > > > > > > What I mean is this: if we revert
the above commit, things still
> > > > > > > > > work (out of spec, but still). If
we revert and defer interrupts until
> > > > > > > > > device ready then ndo_open that
triggers before device ready deadlocks.
> > > > > > > >
> > > > > > > > Ok, I guess you meant on a hypervisor
that is strictly written with spec.
> > > > > > >
> > > > > > > I mean on hypervisor that starts processing
queues after getting a kick
> > > > > > > even without DRIVER_OK.
> > > > > >
> > > > > > Oh right.
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > So, thinking about all
this, how about a simple per vq flag meaning
> > > > > > > > > > > "this vq was kicked
since reset"?
> > > > > > > > > >
> > > > > > > > > > And ignore the notification if
vq is not kicked? It sounds like the
> > > > > > > > > > callback needs to be
synchronized with the kick.
> > > > > > > > >
> > > > > > > > > Note we only need to synchronize it
when it changes, which is
> > > > > > > > > only during initialization and
reset.
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > If driver does not kick
then it's not ready to get callbacks, right?
> > > > > > > > > > >
> > > > > > > > > > > Sounds quite clean, but
we need to think through memory ordering
> > > > > > > > > > > concerns - I guess
it's only when we change the value so
> > > > > > > > > > >         if
(!vq->kicked) {
> > > > > > > > > > >                
vq->kicked = true;
> > > > > > > > > > >                 mb();
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > > will do the trick, right?
> > > > > > > > > >
> > > > > > > > > > There's no much difference
with the existing approach:
> > > > > > > > > >
> > > > > > > > > > 1) your proposal implicitly
makes callbacks ready in virtqueue_kick()
> > > > > > > > > > 2) my proposal explicitly
makes callbacks ready via virtio_device_ready()
> > > > > > > > > >
> > > > > > > > > > Both require careful auditing
of all the existing drivers to make sure
> > > > > > > > > > no kick before DRIVER_OK.
> > > > > > > > >
> > > > > > > > > Jason, kick before DRIVER_OK is out
of spec, sure. But it is unrelated
> > > > > > > > > to hardening
> > > > > > > >
> > > > > > > > Yes but with your proposal, it seems to
couple kick with DRIVER_OK somehow.
> > > > > > >
> > > > > > > I don't see how - my proposal ignores
DRIVER_OK issues.
> > > > > >
> > > > > > Yes, what I meant is, in your proposal, the first
kick after rest is a
> > > > > > hint that the driver is ok (but actually it could
not).
> > > > > >
> > > > > > >
> > > > > > > > > and in absence of config interrupts
is generally easily
> > > > > > > > > fixed just by sticking
virtio_device_ready early in initialization.
> > > > > > > >
> > > > > > > > So if the kick is done before the
subsystem registration, there's
> > > > > > > > still a window in the middle (assuming
we stick virtio_device_ready()
> > > > > > > > early):
> > > > > > > >
> > > > > > > > virtio_device_ready()
> > > > > > > > virtqueue_kick()
> > > > > > > > /* the window */
> > > > > > > > subsystem_registration()
> > > > > > >
> > > > > > > Absolutely, however, I do not think we really
have many such drivers
> > > > > > > since this has been known as a wrong thing to
do since the beginning.
> > > > > > > Want to try to find any?
> > > > > >
> > > > > > Yes, let me try and update.
> > > > >
> > > > > This is basically the device that have an RX queue, so
I've found the
> > > > > following drivers:
> > > > >
> > > > > scmi, mac80211_hwsim, vsock, bt, balloon.
> > > >
> > > > Looked and I don't see it yet. Let's consider
> > > > ./net/vmw_vsock/virtio_transport.c for example. Assuming we
block
> > > > callbacks until the first kick, what is the issue with probe
exactly?
> > >
> > > We need to make sure the callback can survive when it runs before
sub
> > > system registration.
> >
> > With my proposal no - only if we also kick before registration.
> > So I do not see the issue yet.
> >
> > Consider ./net/vmw_vsock/virtio_transport.c
> >
> > kicks: virtio_transport_send_pkt_work,
> > virtio_vsock_rx_fill, virtio_vsock_event_fill
> >
> > which of these triggers before we are ready to
> > handle callbacks?
> 
> So:
> 
> virtio_vsock_vqs_init()
>     virtio_device_ready()
>     virtio_vsock_rx_fill() /* kick there */
> rcu_assign_pointer(the_virtio_vsock, vsock)
> 
> It means at least virtio_vsock_rx_done()/virtio_vsock_workqueue needs
> to survive. I don't say it has a bug but we do need to audit the code
> in this case. The implication is: the virtqueue callback should be
> written with no assumption that the driver has registered in the
> subsystem. We don't or can't assume all drivers are written in this
> way.

I thought you said you audited code and found bugs.

My claim is that simply because qemu starts processing
packets immediately upon kick, if bugs like this
existed we would have noticed by now.

In this case the_virtio_vsock is used for xmit things,
callbacks do not seem to use it at all.
> >
> >
> > > >
> > > >
> > > > > >
> > > > > > >I couldn't ... except maybe bluetooth
> > > > > > > but that's just maintainer nacking fixes
saying he'll fix it
> > > > > > > his way ...
> > > > > > >
> > > > > > > > And during remove(), we get another
window:
> > > > > > > >
> > > > > > > > subsysrem_unregistration()
> > > > > > > > /* the window */
> > > > > > > > virtio_device_reset()
> > > > > > >
> > > > > > > Same here.
> > > > >
> > > > > Basically for the drivers that set driver_ok before
registration,
> > > >
> > > > I don't see what does driver_ok have to do with it.
> > >
> > > I meant for those driver, in probe they do()
> > >
> > > virtio_device_ready()
> > > subsystem_register()
> > >
> > > In remove() they do
> > >
> > > subsystem_unregister()
> > > virtio_device_reset()
> > >
> > > for symmetry
> >
> > Let's leave remove alone for now. I am close to 100% sure we have
*lots*
> > of issues around it, but while probe is unavoidable remove can be
> > avoided by blocking hotplug.
> 
> Unbind can trigger this path as well.
> 
> >
> >
> > > >
> > > > > so
> > > > > we have a lot:
> > > > >
> > > > > blk, net, mac80211_hwsim, scsi, vsock, bt, crypto,
gpio, gpu, i2c,
> > > > > iommu, caif, pmem, input, mem
> > > > >
> > > > > So I think there's no easy way to harden the
notification without
> > > > > auditing the driver one by one (especially considering
the driver may
> > > > > use bh or workqueue). The problem is the notification
hardening
> > > > > depends on a correct or race-free probe/remove. So we
need to fix the
> > > > > issues in probe/remove then do the hardening on the
notification.
> > > > >
> > > > > Thanks
> > > >
> > > > So if drivers kick but are not ready to get callbacks then
let's fix
> > > > that first of all, these are racy with existing qemu even
ignoring
> > > > spec compliance.
> > >
> > > Yes, (the patches I've posted so far exist even with a
well-behaved device).
> > >
> > > Thanks
> >
> > patches you posted deal with DRIVER_OK spec compliance.
> > I do not see patches for kicks before callbacks are ready to run.
> 
> Yes.
> 
> Thanks
> 
> >
> > > >
> > > >
> > > > --
> > > > MST
> > > >
> >

Jason Wang

2022-Jun-30 02:01 UTC

head link

[PATCH V3] virtio: disable notification hardening by default

On Wed, Jun 29, 2022 at 4:52 PM Michael S. Tsirkin <mst at redhat.com>
wrote:>
> On Wed, Jun 29, 2022 at 04:34:36PM +0800, Jason Wang wrote:
> > On Wed, Jun 29, 2022 at 3:15 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >
> > > On Wed, Jun 29, 2022 at 03:02:21PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 29, 2022 at 2:31 PM Michael S. Tsirkin <mst
at redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason Wang
wrote:
> > > > > > On Tue, Jun 28, 2022 at 2:17 PM Jason Wang
<jasowang at redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 28, 2022 at 1:00 PM Michael S.
Tsirkin <mst at redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 28, 2022 at 11:49:12AM
+0800, Jason Wang wrote:
> > > > > > > > > > Heh. Yea sure. But things work
fine for people. What is the chance
> > > > > > > > > > your review found and fixed
all driver bugs?
> > > > > > > > >
> > > > > > > > > I don't/can't audit all
bugs but the race between open/close against
> > > > > > > > > ready/reset. It looks to me a good
chance to fix them all but if you
> > > > > > > > > think differently, let me know
> > > > > > > > >
> > > > > > > > > > After two attempts
> > > > > > > > > > I don't feel like hoping
audit will fix all bugs.
> > > > > > > > >
> > > > > > > > > I've started the auditing and
have 15+ patches in the queue. (only
> > > > > > > > > covers bluetooth, console, pmem,
virtio-net and caif). Spotting the
> > > > > > > > > issue is not hard but the testing,
It would take at least the time of
> > > > > > > > > one release to finalize I guess.
> > > > > > > >
> > > > > > > > Absolutely. So I am looking for a way to
implement hardening that does
> > > > > > > > not break existing drivers.
> > > > > > >
> > > > > > > I totally agree with you to seek a way
without bothering the drivers.
> > > > > > > Just wonder if this is possbile.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The reason config
was kind of easy is that config interrupt is rarely
> > > > > > > > > > > > vital for device
function so arbitrarily deferring that does not lead to
> > > > > > > > > > > > deadlocks - what you
are trying to do with VQ interrupts is
> > > > > > > > > > > > fundamentally
different. Things are especially bad if we just drop
> > > > > > > > > > > > an interrupt but
deferring can lead to problems too.
> > > > > > > > > > >
> > > > > > > > > > > I'm not sure I see
the difference, disable_irq() stuffs also delay the
> > > > > > > > > > > interrupt processing
until enable_irq().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Absolutely. I am not at all
sure disable_irq fixes all problems.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Consider as an
example
> > > > > > > > > > > >     virtio-net: fix
race between ndo_open() and virtio_device_ready()
> > > > > > > > > > > > if you just defer vq
interrupts you get deadlocks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I don't see a
deadlock here, maybe you can show more detail on this?
> > > > > > > > > >
> > > > > > > > > > What I mean is this: if we
revert the above commit, things still
> > > > > > > > > > work (out of spec, but still).
If we revert and defer interrupts until
> > > > > > > > > > device ready then ndo_open
that triggers before device ready deadlocks.
> > > > > > > > >
> > > > > > > > > Ok, I guess you meant on a
hypervisor that is strictly written with spec.
> > > > > > > >
> > > > > > > > I mean on hypervisor that starts
processing queues after getting a kick
> > > > > > > > even without DRIVER_OK.
> > > > > > >
> > > > > > > Oh right.
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > So, thinking about
all this, how about a simple per vq flag meaning
> > > > > > > > > > > > "this vq was
kicked since reset"?
> > > > > > > > > > >
> > > > > > > > > > > And ignore the
notification if vq is not kicked? It sounds like the
> > > > > > > > > > > callback needs to be
synchronized with the kick.
> > > > > > > > > >
> > > > > > > > > > Note we only need to
synchronize it when it changes, which is
> > > > > > > > > > only during initialization and
reset.
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If driver does not
kick then it's not ready to get callbacks, right?
> > > > > > > > > > > >
> > > > > > > > > > > > Sounds quite clean,
but we need to think through memory ordering
> > > > > > > > > > > > concerns - I guess
it's only when we change the value so
> > > > > > > > > > > >         if
(!vq->kicked) {
> > > > > > > > > > > >                
vq->kicked = true;
> > > > > > > > > > > >                
mb();
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > will do the trick,
right?
> > > > > > > > > > >
> > > > > > > > > > > There's no much
difference with the existing approach:
> > > > > > > > > > >
> > > > > > > > > > > 1) your proposal
implicitly makes callbacks ready in virtqueue_kick()
> > > > > > > > > > > 2) my proposal explicitly
makes callbacks ready via virtio_device_ready()
> > > > > > > > > > >
> > > > > > > > > > > Both require careful
auditing of all the existing drivers to make sure
> > > > > > > > > > > no kick before DRIVER_OK.
> > > > > > > > > >
> > > > > > > > > > Jason, kick before DRIVER_OK
is out of spec, sure. But it is unrelated
> > > > > > > > > > to hardening
> > > > > > > > >
> > > > > > > > > Yes but with your proposal, it
seems to couple kick with DRIVER_OK somehow.
> > > > > > > >
> > > > > > > > I don't see how - my proposal
ignores DRIVER_OK issues.
> > > > > > >
> > > > > > > Yes, what I meant is, in your proposal, the
first kick after rest is a
> > > > > > > hint that the driver is ok (but actually it
could not).
> > > > > > >
> > > > > > > >
> > > > > > > > > > and in absence of config
interrupts is generally easily
> > > > > > > > > > fixed just by sticking
virtio_device_ready early in initialization.
> > > > > > > > >
> > > > > > > > > So if the kick is done before the
subsystem registration, there's
> > > > > > > > > still a window in the middle
(assuming we stick virtio_device_ready()
> > > > > > > > > early):
> > > > > > > > >
> > > > > > > > > virtio_device_ready()
> > > > > > > > > virtqueue_kick()
> > > > > > > > > /* the window */
> > > > > > > > > subsystem_registration()
> > > > > > > >
> > > > > > > > Absolutely, however, I do not think we
really have many such drivers
> > > > > > > > since this has been known as a wrong
thing to do since the beginning.
> > > > > > > > Want to try to find any?
> > > > > > >
> > > > > > > Yes, let me try and update.
> > > > > >
> > > > > > This is basically the device that have an RX
queue, so I've found the
> > > > > > following drivers:
> > > > > >
> > > > > > scmi, mac80211_hwsim, vsock, bt, balloon.
> > > > >
> > > > > Looked and I don't see it yet. Let's consider
> > > > > ./net/vmw_vsock/virtio_transport.c for example.
Assuming we block
> > > > > callbacks until the first kick, what is the issue with
probe exactly?
> > > >
> > > > We need to make sure the callback can survive when it runs
before sub
> > > > system registration.
> > >
> > > With my proposal no - only if we also kick before registration.
> > > So I do not see the issue yet.
> > >
> > > Consider ./net/vmw_vsock/virtio_transport.c
> > >
> > > kicks: virtio_transport_send_pkt_work,
> > > virtio_vsock_rx_fill, virtio_vsock_event_fill
> > >
> > > which of these triggers before we are ready to
> > > handle callbacks?
> >
> > So:
> >
> > virtio_vsock_vqs_init()
> >     virtio_device_ready()
> >     virtio_vsock_rx_fill() /* kick there */
> > rcu_assign_pointer(the_virtio_vsock, vsock)
> >
> > It means at least virtio_vsock_rx_done()/virtio_vsock_workqueue needs
> > to survive. I don't say it has a bug but we do need to audit the
code
> > in this case. The implication is: the virtqueue callback should be
> > written with no assumption that the driver has registered in the
> > subsystem. We don't or can't assume all drivers are written in
this
> > way.
>
>
> I thought you said you audited code and found bugs.
>
> My claim is that simply because qemu starts processing
> packets immediately upon kick, if bugs like this
> existed we would have noticed by now.
This is true for a well behaved hypervisor. But what we want to deal
with is the buggy/malicious hypervisors.
>
> In this case the_virtio_vsock is used for xmit things,
> callbacks do not seem to use it at all.
So the hypervisor can trigger the notification just after the kick and
the work function seems to be safe.

One another example for this is in virtcons_probe():

        spin_lock_init(&portdev->ports_lock);
        INIT_LIST_HEAD(&portdev->ports);
        INIT_LIST_HEAD(&portdev->list);

        virtio_device_ready(portdev->vdev);

        INIT_WORK(&portdev->config_work, &config_work_handler);
        INIT_WORK(&portdev->control_work, &control_work_handler);

in control_intr() we had:

static void control_intr(struct virtqueue *vq)
{
        struct ports_device *portdev;

        portdev = vq->vdev->priv;
        schedule_work(&portdev->control_work);
}

So we might crash if the notification is raised just after
virtio_device_ready().

This is not an exact example of when a callback is not ready after
kick, but it demonstrates that the callback could have assumed that
all setup has been done when it is called.

Thanks
>
> > >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > >I couldn't ... except maybe bluetooth
> > > > > > > > but that's just maintainer nacking
fixes saying he'll fix it
> > > > > > > > his way ...
> > > > > > > >
> > > > > > > > > And during remove(), we get another
window:
> > > > > > > > >
> > > > > > > > > subsysrem_unregistration()
> > > > > > > > > /* the window */
> > > > > > > > > virtio_device_reset()
> > > > > > > >
> > > > > > > > Same here.
> > > > > >
> > > > > > Basically for the drivers that set driver_ok
before registration,
> > > > >
> > > > > I don't see what does driver_ok have to do with it.
> > > >
> > > > I meant for those driver, in probe they do()
> > > >
> > > > virtio_device_ready()
> > > > subsystem_register()
> > > >
> > > > In remove() they do
> > > >
> > > > subsystem_unregister()
> > > > virtio_device_reset()
> > > >
> > > > for symmetry
> > >
> > > Let's leave remove alone for now. I am close to 100% sure we
have *lots*
> > > of issues around it, but while probe is unavoidable remove can be
> > > avoided by blocking hotplug.
> >
> > Unbind can trigger this path as well.
> >
> > >
> > >
> > > > >
> > > > > > so
> > > > > > we have a lot:
> > > > > >
> > > > > > blk, net, mac80211_hwsim, scsi, vsock, bt, crypto,
gpio, gpu, i2c,
> > > > > > iommu, caif, pmem, input, mem
> > > > > >
> > > > > > So I think there's no easy way to harden the
notification without
> > > > > > auditing the driver one by one (especially
considering the driver may
> > > > > > use bh or workqueue). The problem is the
notification hardening
> > > > > > depends on a correct or race-free probe/remove. So
we need to fix the
> > > > > > issues in probe/remove then do the hardening on
the notification.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > So if drivers kick but are not ready to get callbacks
then let's fix
> > > > > that first of all, these are racy with existing qemu
even ignoring
> > > > > spec compliance.
> > > >
> > > > Yes, (the patches I've posted so far exist even with a
well-behaved device).
> > > >
> > > > Thanks
> > >
> > > patches you posted deal with DRIVER_OK spec compliance.
> > > I do not see patches for kicks before callbacks are ready to run.
> >
> > Yes.
> >
> > Thanks
> >
> > >
> > > > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>

Virtualization - Jun 2022 - [PATCH V3] virtio: disable notification hardening by default

[PATCH V3] virtio: disable notification hardening by default

[PATCH V3] virtio: disable notification hardening by default