thr3ads.net - Virtualization - [PATCH V3] virtio: disable notification hardening by default [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2022-Jun-30 02:01 UTC

[PATCH V3] virtio: disable notification hardening by default

On Wed, Jun 29, 2022 at 4:52 PM Michael S. Tsirkin <mst at redhat.com>
wrote:>
> On Wed, Jun 29, 2022 at 04:34:36PM +0800, Jason Wang wrote:
> > On Wed, Jun 29, 2022 at 3:15 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >
> > > On Wed, Jun 29, 2022 at 03:02:21PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 29, 2022 at 2:31 PM Michael S. Tsirkin <mst
at redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason Wang
wrote:
> > > > > > On Tue, Jun 28, 2022 at 2:17 PM Jason Wang
<jasowang at redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 28, 2022 at 1:00 PM Michael S.
Tsirkin <mst at redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 28, 2022 at 11:49:12AM
+0800, Jason Wang wrote:
> > > > > > > > > > Heh. Yea sure. But things work
fine for people. What is the chance
> > > > > > > > > > your review found and fixed
all driver bugs?
> > > > > > > > >
> > > > > > > > > I don't/can't audit all
bugs but the race between open/close against
> > > > > > > > > ready/reset. It looks to me a good
chance to fix them all but if you
> > > > > > > > > think differently, let me know
> > > > > > > > >
> > > > > > > > > > After two attempts
> > > > > > > > > > I don't feel like hoping
audit will fix all bugs.
> > > > > > > > >
> > > > > > > > > I've started the auditing and
have 15+ patches in the queue. (only
> > > > > > > > > covers bluetooth, console, pmem,
virtio-net and caif). Spotting the
> > > > > > > > > issue is not hard but the testing,
It would take at least the time of
> > > > > > > > > one release to finalize I guess.
> > > > > > > >
> > > > > > > > Absolutely. So I am looking for a way to
implement hardening that does
> > > > > > > > not break existing drivers.
> > > > > > >
> > > > > > > I totally agree with you to seek a way
without bothering the drivers.
> > > > > > > Just wonder if this is possbile.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The reason config
was kind of easy is that config interrupt is rarely
> > > > > > > > > > > > vital for device
function so arbitrarily deferring that does not lead to
> > > > > > > > > > > > deadlocks - what you
are trying to do with VQ interrupts is
> > > > > > > > > > > > fundamentally
different. Things are especially bad if we just drop
> > > > > > > > > > > > an interrupt but
deferring can lead to problems too.
> > > > > > > > > > >
> > > > > > > > > > > I'm not sure I see
the difference, disable_irq() stuffs also delay the
> > > > > > > > > > > interrupt processing
until enable_irq().
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Absolutely. I am not at all
sure disable_irq fixes all problems.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Consider as an
example
> > > > > > > > > > > >     virtio-net: fix
race between ndo_open() and virtio_device_ready()
> > > > > > > > > > > > if you just defer vq
interrupts you get deadlocks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I don't see a
deadlock here, maybe you can show more detail on this?
> > > > > > > > > >
> > > > > > > > > > What I mean is this: if we
revert the above commit, things still
> > > > > > > > > > work (out of spec, but still).
If we revert and defer interrupts until
> > > > > > > > > > device ready then ndo_open
that triggers before device ready deadlocks.
> > > > > > > > >
> > > > > > > > > Ok, I guess you meant on a
hypervisor that is strictly written with spec.
> > > > > > > >
> > > > > > > > I mean on hypervisor that starts
processing queues after getting a kick
> > > > > > > > even without DRIVER_OK.
> > > > > > >
> > > > > > > Oh right.
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > So, thinking about
all this, how about a simple per vq flag meaning
> > > > > > > > > > > > "this vq was
kicked since reset"?
> > > > > > > > > > >
> > > > > > > > > > > And ignore the
notification if vq is not kicked? It sounds like the
> > > > > > > > > > > callback needs to be
synchronized with the kick.
> > > > > > > > > >
> > > > > > > > > > Note we only need to
synchronize it when it changes, which is
> > > > > > > > > > only during initialization and
reset.
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > If driver does not
kick then it's not ready to get callbacks, right?
> > > > > > > > > > > >
> > > > > > > > > > > > Sounds quite clean,
but we need to think through memory ordering
> > > > > > > > > > > > concerns - I guess
it's only when we change the value so
> > > > > > > > > > > >         if
(!vq->kicked) {
> > > > > > > > > > > >                
vq->kicked = true;
> > > > > > > > > > > >                
mb();
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > will do the trick,
right?
> > > > > > > > > > >
> > > > > > > > > > > There's no much
difference with the existing approach:
> > > > > > > > > > >
> > > > > > > > > > > 1) your proposal
implicitly makes callbacks ready in virtqueue_kick()
> > > > > > > > > > > 2) my proposal explicitly
makes callbacks ready via virtio_device_ready()
> > > > > > > > > > >
> > > > > > > > > > > Both require careful
auditing of all the existing drivers to make sure
> > > > > > > > > > > no kick before DRIVER_OK.
> > > > > > > > > >
> > > > > > > > > > Jason, kick before DRIVER_OK
is out of spec, sure. But it is unrelated
> > > > > > > > > > to hardening
> > > > > > > > >
> > > > > > > > > Yes but with your proposal, it
seems to couple kick with DRIVER_OK somehow.
> > > > > > > >
> > > > > > > > I don't see how - my proposal
ignores DRIVER_OK issues.
> > > > > > >
> > > > > > > Yes, what I meant is, in your proposal, the
first kick after rest is a
> > > > > > > hint that the driver is ok (but actually it
could not).
> > > > > > >
> > > > > > > >
> > > > > > > > > > and in absence of config
interrupts is generally easily
> > > > > > > > > > fixed just by sticking
virtio_device_ready early in initialization.
> > > > > > > > >
> > > > > > > > > So if the kick is done before the
subsystem registration, there's
> > > > > > > > > still a window in the middle
(assuming we stick virtio_device_ready()
> > > > > > > > > early):
> > > > > > > > >
> > > > > > > > > virtio_device_ready()
> > > > > > > > > virtqueue_kick()
> > > > > > > > > /* the window */
> > > > > > > > > subsystem_registration()
> > > > > > > >
> > > > > > > > Absolutely, however, I do not think we
really have many such drivers
> > > > > > > > since this has been known as a wrong
thing to do since the beginning.
> > > > > > > > Want to try to find any?
> > > > > > >
> > > > > > > Yes, let me try and update.
> > > > > >
> > > > > > This is basically the device that have an RX
queue, so I've found the
> > > > > > following drivers:
> > > > > >
> > > > > > scmi, mac80211_hwsim, vsock, bt, balloon.
> > > > >
> > > > > Looked and I don't see it yet. Let's consider
> > > > > ./net/vmw_vsock/virtio_transport.c for example.
Assuming we block
> > > > > callbacks until the first kick, what is the issue with
probe exactly?
> > > >
> > > > We need to make sure the callback can survive when it runs
before sub
> > > > system registration.
> > >
> > > With my proposal no - only if we also kick before registration.
> > > So I do not see the issue yet.
> > >
> > > Consider ./net/vmw_vsock/virtio_transport.c
> > >
> > > kicks: virtio_transport_send_pkt_work,
> > > virtio_vsock_rx_fill, virtio_vsock_event_fill
> > >
> > > which of these triggers before we are ready to
> > > handle callbacks?
> >
> > So:
> >
> > virtio_vsock_vqs_init()
> >     virtio_device_ready()
> >     virtio_vsock_rx_fill() /* kick there */
> > rcu_assign_pointer(the_virtio_vsock, vsock)
> >
> > It means at least virtio_vsock_rx_done()/virtio_vsock_workqueue needs
> > to survive. I don't say it has a bug but we do need to audit the
code
> > in this case. The implication is: the virtqueue callback should be
> > written with no assumption that the driver has registered in the
> > subsystem. We don't or can't assume all drivers are written in
this
> > way.
>
>
> I thought you said you audited code and found bugs.
>
> My claim is that simply because qemu starts processing
> packets immediately upon kick, if bugs like this
> existed we would have noticed by now.
This is true for a well behaved hypervisor. But what we want to deal
with is the buggy/malicious hypervisors.
>
> In this case the_virtio_vsock is used for xmit things,
> callbacks do not seem to use it at all.
So the hypervisor can trigger the notification just after the kick and
the work function seems to be safe.

One another example for this is in virtcons_probe():

        spin_lock_init(&portdev->ports_lock);
        INIT_LIST_HEAD(&portdev->ports);
        INIT_LIST_HEAD(&portdev->list);

        virtio_device_ready(portdev->vdev);

        INIT_WORK(&portdev->config_work, &config_work_handler);
        INIT_WORK(&portdev->control_work, &control_work_handler);

in control_intr() we had:

static void control_intr(struct virtqueue *vq)
{
        struct ports_device *portdev;

        portdev = vq->vdev->priv;
        schedule_work(&portdev->control_work);
}

So we might crash if the notification is raised just after
virtio_device_ready().

This is not an exact example of when a callback is not ready after
kick, but it demonstrates that the callback could have assumed that
all setup has been done when it is called.

Thanks
>
> > >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > >I couldn't ... except maybe bluetooth
> > > > > > > > but that's just maintainer nacking
fixes saying he'll fix it
> > > > > > > > his way ...
> > > > > > > >
> > > > > > > > > And during remove(), we get another
window:
> > > > > > > > >
> > > > > > > > > subsysrem_unregistration()
> > > > > > > > > /* the window */
> > > > > > > > > virtio_device_reset()
> > > > > > > >
> > > > > > > > Same here.
> > > > > >
> > > > > > Basically for the drivers that set driver_ok
before registration,
> > > > >
> > > > > I don't see what does driver_ok have to do with it.
> > > >
> > > > I meant for those driver, in probe they do()
> > > >
> > > > virtio_device_ready()
> > > > subsystem_register()
> > > >
> > > > In remove() they do
> > > >
> > > > subsystem_unregister()
> > > > virtio_device_reset()
> > > >
> > > > for symmetry
> > >
> > > Let's leave remove alone for now. I am close to 100% sure we
have *lots*
> > > of issues around it, but while probe is unavoidable remove can be
> > > avoided by blocking hotplug.
> >
> > Unbind can trigger this path as well.
> >
> > >
> > >
> > > > >
> > > > > > so
> > > > > > we have a lot:
> > > > > >
> > > > > > blk, net, mac80211_hwsim, scsi, vsock, bt, crypto,
gpio, gpu, i2c,
> > > > > > iommu, caif, pmem, input, mem
> > > > > >
> > > > > > So I think there's no easy way to harden the
notification without
> > > > > > auditing the driver one by one (especially
considering the driver may
> > > > > > use bh or workqueue). The problem is the
notification hardening
> > > > > > depends on a correct or race-free probe/remove. So
we need to fix the
> > > > > > issues in probe/remove then do the hardening on
the notification.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > So if drivers kick but are not ready to get callbacks
then let's fix
> > > > > that first of all, these are racy with existing qemu
even ignoring
> > > > > spec compliance.
> > > >
> > > > Yes, (the patches I've posted so far exist even with a
well-behaved device).
> > > >
> > > > Thanks
> > >
> > > patches you posted deal with DRIVER_OK spec compliance.
> > > I do not see patches for kicks before callbacks are ready to run.
> >
> > Yes.
> >
> > Thanks
> >
> > >
> > > > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>

Michael S. Tsirkin

2022-Jun-30 08:38 UTC

head link

[PATCH V3] virtio: disable notification hardening by default

On Thu, Jun 30, 2022 at 10:01:16AM +0800, Jason Wang
wrote:> On Wed, Jun 29, 2022 at 4:52 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> >
> > On Wed, Jun 29, 2022 at 04:34:36PM +0800, Jason Wang wrote:
> > > On Wed, Jun 29, 2022 at 3:15 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > > >
> > > > On Wed, Jun 29, 2022 at 03:02:21PM +0800, Jason Wang wrote:
> > > > > On Wed, Jun 29, 2022 at 2:31 PM Michael S. Tsirkin
<mst at redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason
Wang wrote:
> > > > > > > On Tue, Jun 28, 2022 at 2:17 PM Jason Wang
<jasowang at redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 28, 2022 at 1:00 PM Michael
S. Tsirkin <mst at redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jun 28, 2022 at 11:49:12AM
+0800, Jason Wang wrote:
> > > > > > > > > > > Heh. Yea sure. But things
work fine for people. What is the chance
> > > > > > > > > > > your review found and
fixed all driver bugs?
> > > > > > > > > >
> > > > > > > > > > I don't/can't audit
all bugs but the race between open/close against
> > > > > > > > > > ready/reset. It looks to me a
good chance to fix them all but if you
> > > > > > > > > > think differently, let me know
> > > > > > > > > >
> > > > > > > > > > > After two attempts
> > > > > > > > > > > I don't feel like
hoping audit will fix all bugs.
> > > > > > > > > >
> > > > > > > > > > I've started the auditing
and have 15+ patches in the queue. (only
> > > > > > > > > > covers bluetooth, console,
pmem, virtio-net and caif). Spotting the
> > > > > > > > > > issue is not hard but the
testing, It would take at least the time of
> > > > > > > > > > one release to finalize I
guess.
> > > > > > > > >
> > > > > > > > > Absolutely. So I am looking for a
way to implement hardening that does
> > > > > > > > > not break existing drivers.
> > > > > > > >
> > > > > > > > I totally agree with you to seek a way
without bothering the drivers.
> > > > > > > > Just wonder if this is possbile.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The reason
config was kind of easy is that config interrupt is rarely
> > > > > > > > > > > > > vital for
device function so arbitrarily deferring that does not lead to
> > > > > > > > > > > > > deadlocks -
what you are trying to do with VQ interrupts is
> > > > > > > > > > > > > fundamentally
different. Things are especially bad if we just drop
> > > > > > > > > > > > > an interrupt
but deferring can lead to problems too.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm not sure I
see the difference, disable_irq() stuffs also delay the
> > > > > > > > > > > > interrupt processing
until enable_irq().
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Absolutely. I am not at
all sure disable_irq fixes all problems.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Consider as an
example
> > > > > > > > > > > > >     virtio-net:
fix race between ndo_open() and virtio_device_ready()
> > > > > > > > > > > > > if you just
defer vq interrupts you get deadlocks.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I don't see a
deadlock here, maybe you can show more detail on this?
> > > > > > > > > > >
> > > > > > > > > > > What I mean is this: if
we revert the above commit, things still
> > > > > > > > > > > work (out of spec, but
still). If we revert and defer interrupts until
> > > > > > > > > > > device ready then
ndo_open that triggers before device ready deadlocks.
> > > > > > > > > >
> > > > > > > > > > Ok, I guess you meant on a
hypervisor that is strictly written with spec.
> > > > > > > > >
> > > > > > > > > I mean on hypervisor that starts
processing queues after getting a kick
> > > > > > > > > even without DRIVER_OK.
> > > > > > > >
> > > > > > > > Oh right.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > So, thinking
about all this, how about a simple per vq flag meaning
> > > > > > > > > > > > > "this vq
was kicked since reset"?
> > > > > > > > > > > >
> > > > > > > > > > > > And ignore the
notification if vq is not kicked? It sounds like the
> > > > > > > > > > > > callback needs to be
synchronized with the kick.
> > > > > > > > > > >
> > > > > > > > > > > Note we only need to
synchronize it when it changes, which is
> > > > > > > > > > > only during
initialization and reset.
> > > > > > > > > >
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > If driver does
not kick then it's not ready to get callbacks, right?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sounds quite
clean, but we need to think through memory ordering
> > > > > > > > > > > > > concerns - I
guess it's only when we change the value so
> > > > > > > > > > > > >         if
(!vq->kicked) {
> > > > > > > > > > > > >                
vq->kicked = true;
> > > > > > > > > > > > >                
mb();
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > > will do the
trick, right?
> > > > > > > > > > > >
> > > > > > > > > > > > There's no much
difference with the existing approach:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) your proposal
implicitly makes callbacks ready in virtqueue_kick()
> > > > > > > > > > > > 2) my proposal
explicitly makes callbacks ready via virtio_device_ready()
> > > > > > > > > > > >
> > > > > > > > > > > > Both require careful
auditing of all the existing drivers to make sure
> > > > > > > > > > > > no kick before
DRIVER_OK.
> > > > > > > > > > >
> > > > > > > > > > > Jason, kick before
DRIVER_OK is out of spec, sure. But it is unrelated
> > > > > > > > > > > to hardening
> > > > > > > > > >
> > > > > > > > > > Yes but with your proposal, it
seems to couple kick with DRIVER_OK somehow.
> > > > > > > > >
> > > > > > > > > I don't see how - my proposal
ignores DRIVER_OK issues.
> > > > > > > >
> > > > > > > > Yes, what I meant is, in your proposal,
the first kick after rest is a
> > > > > > > > hint that the driver is ok (but actually
it could not).
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > > and in absence of config
interrupts is generally easily
> > > > > > > > > > > fixed just by sticking
virtio_device_ready early in initialization.
> > > > > > > > > >
> > > > > > > > > > So if the kick is done before
the subsystem registration, there's
> > > > > > > > > > still a window in the middle
(assuming we stick virtio_device_ready()
> > > > > > > > > > early):
> > > > > > > > > >
> > > > > > > > > > virtio_device_ready()
> > > > > > > > > > virtqueue_kick()
> > > > > > > > > > /* the window */
> > > > > > > > > > subsystem_registration()
> > > > > > > > >
> > > > > > > > > Absolutely, however, I do not think
we really have many such drivers
> > > > > > > > > since this has been known as a
wrong thing to do since the beginning.
> > > > > > > > > Want to try to find any?
> > > > > > > >
> > > > > > > > Yes, let me try and update.
> > > > > > >
> > > > > > > This is basically the device that have an RX
queue, so I've found the
> > > > > > > following drivers:
> > > > > > >
> > > > > > > scmi, mac80211_hwsim, vsock, bt, balloon.
> > > > > >
> > > > > > Looked and I don't see it yet. Let's
consider
> > > > > > ./net/vmw_vsock/virtio_transport.c for example.
Assuming we block
> > > > > > callbacks until the first kick, what is the issue
with probe exactly?
> > > > >
> > > > > We need to make sure the callback can survive when it
runs before sub
> > > > > system registration.
> > > >
> > > > With my proposal no - only if we also kick before
registration.
> > > > So I do not see the issue yet.
> > > >
> > > > Consider ./net/vmw_vsock/virtio_transport.c
> > > >
> > > > kicks: virtio_transport_send_pkt_work,
> > > > virtio_vsock_rx_fill, virtio_vsock_event_fill
> > > >
> > > > which of these triggers before we are ready to
> > > > handle callbacks?
> > >
> > > So:
> > >
> > > virtio_vsock_vqs_init()
> > >     virtio_device_ready()
> > >     virtio_vsock_rx_fill() /* kick there */
> > > rcu_assign_pointer(the_virtio_vsock, vsock)
> > >
> > > It means at least virtio_vsock_rx_done()/virtio_vsock_workqueue
needs
> > > to survive. I don't say it has a bug but we do need to audit
the code
> > > in this case. The implication is: the virtqueue callback should
be
> > > written with no assumption that the driver has registered in the
> > > subsystem. We don't or can't assume all drivers are
written in this
> > > way.
> >
> >
> > I thought you said you audited code and found bugs.
> >
> > My claim is that simply because qemu starts processing
> > packets immediately upon kick, if bugs like this
> > existed we would have noticed by now.
> 
> This is true for a well behaved hypervisor. But what we want to deal
> with is the buggy/malicious hypervisors.


> >
> > In this case the_virtio_vsock is used for xmit things,
> > callbacks do not seem to use it at all.
> 
> So the hypervisor can trigger the notification just after the kick and
> the work function seems to be safe.
> 
> One another example for this is in virtcons_probe():
> 
>         spin_lock_init(&portdev->ports_lock);
>         INIT_LIST_HEAD(&portdev->ports);
>         INIT_LIST_HEAD(&portdev->list);
> 
>         virtio_device_ready(portdev->vdev);
> 
>         INIT_WORK(&portdev->config_work, &config_work_handler);
>         INIT_WORK(&portdev->control_work,
&control_work_handler);
> 
> in control_intr() we had:
> 
> static void control_intr(struct virtqueue *vq)
> {
>         struct ports_device *portdev;
> 
>         portdev = vq->vdev->priv;
>         schedule_work(&portdev->control_work);
> }
> 
> So we might crash if the notification is raised just after
> virtio_device_ready().
Yes! But this is not my proposal. This is yours.
Your patches block interrupts until virtio_device_ready.

My proposal is to block them until kick.

In this case kick is in fill_queue after INIT_WORK.

> This is not an exact example of when a callback is not ready after
> kick, but it demonstrates that the callback could have assumed that
> all setup has been done when it is called.
> 
> Thanks
So if there are not examples of callbacks not ready after kick
then let us block callbacks until first kick. That is my idea.

> >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > >I couldn't ... except maybe
bluetooth
> > > > > > > > > but that's just maintainer
nacking fixes saying he'll fix it
> > > > > > > > > his way ...
> > > > > > > > >
> > > > > > > > > > And during remove(), we get
another window:
> > > > > > > > > >
> > > > > > > > > > subsysrem_unregistration()
> > > > > > > > > > /* the window */
> > > > > > > > > > virtio_device_reset()
> > > > > > > > >
> > > > > > > > > Same here.
> > > > > > >
> > > > > > > Basically for the drivers that set driver_ok
before registration,
> > > > > >
> > > > > > I don't see what does driver_ok have to do
with it.
> > > > >
> > > > > I meant for those driver, in probe they do()
> > > > >
> > > > > virtio_device_ready()
> > > > > subsystem_register()
> > > > >
> > > > > In remove() they do
> > > > >
> > > > > subsystem_unregister()
> > > > > virtio_device_reset()
> > > > >
> > > > > for symmetry
> > > >
> > > > Let's leave remove alone for now. I am close to 100%
sure we have *lots*
> > > > of issues around it, but while probe is unavoidable remove
can be
> > > > avoided by blocking hotplug.
> > >
> > > Unbind can trigger this path as well.
> > >
> > > >
> > > >
> > > > > >
> > > > > > > so
> > > > > > > we have a lot:
> > > > > > >
> > > > > > > blk, net, mac80211_hwsim, scsi, vsock, bt,
crypto, gpio, gpu, i2c,
> > > > > > > iommu, caif, pmem, input, mem
> > > > > > >
> > > > > > > So I think there's no easy way to harden
the notification without
> > > > > > > auditing the driver one by one (especially
considering the driver may
> > > > > > > use bh or workqueue). The problem is the
notification hardening
> > > > > > > depends on a correct or race-free
probe/remove. So we need to fix the
> > > > > > > issues in probe/remove then do the hardening
on the notification.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > So if drivers kick but are not ready to get
callbacks then let's fix
> > > > > > that first of all, these are racy with existing
qemu even ignoring
> > > > > > spec compliance.
> > > > >
> > > > > Yes, (the patches I've posted so far exist even
with a well-behaved device).
> > > > >
> > > > > Thanks
> > > >
> > > > patches you posted deal with DRIVER_OK spec compliance.
> > > > I do not see patches for kicks before callbacks are ready to
run.
> > >
> > > Yes.
> > >
> > > Thanks
> > >
> > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > MST
> > > > > >
> > > >
> >

Jason Wang

2022-Jul-04 04:23 UTC

head link

[PATCH V3] virtio: disable notification hardening by default

On Thu, Jun 30, 2022 at 4:38 PM Michael S. Tsirkin <mst at redhat.com>
wrote:>
> On Thu, Jun 30, 2022 at 10:01:16AM +0800, Jason Wang wrote:
> > On Wed, Jun 29, 2022 at 4:52 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >
> > > On Wed, Jun 29, 2022 at 04:34:36PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 29, 2022 at 3:15 PM Michael S. Tsirkin <mst
at redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 29, 2022 at 03:02:21PM +0800, Jason Wang
wrote:
> > > > > > On Wed, Jun 29, 2022 at 2:31 PM Michael S. Tsirkin
<mst at redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 29, 2022 at 12:07:11PM +0800,
Jason Wang wrote:
> > > > > > > > On Tue, Jun 28, 2022 at 2:17 PM Jason
Wang <jasowang at redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jun 28, 2022 at 1:00 PM
Michael S. Tsirkin <mst at redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 28, 2022 at
11:49:12AM +0800, Jason Wang wrote:
> > > > > > > > > > > > Heh. Yea sure. But
things work fine for people. What is the chance
> > > > > > > > > > > > your review found
and fixed all driver bugs?
> > > > > > > > > > >
> > > > > > > > > > > I don't/can't
audit all bugs but the race between open/close against
> > > > > > > > > > > ready/reset. It looks to
me a good chance to fix them all but if you
> > > > > > > > > > > think differently, let me
know
> > > > > > > > > > >
> > > > > > > > > > > > After two attempts
> > > > > > > > > > > > I don't feel
like hoping audit will fix all bugs.
> > > > > > > > > > >
> > > > > > > > > > > I've started the
auditing and have 15+ patches in the queue. (only
> > > > > > > > > > > covers bluetooth,
console, pmem, virtio-net and caif). Spotting the
> > > > > > > > > > > issue is not hard but the
testing, It would take at least the time of
> > > > > > > > > > > one release to finalize I
guess.
> > > > > > > > > >
> > > > > > > > > > Absolutely. So I am looking
for a way to implement hardening that does
> > > > > > > > > > not break existing drivers.
> > > > > > > > >
> > > > > > > > > I totally agree with you to seek a
way without bothering the drivers.
> > > > > > > > > Just wonder if this is possbile.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The reason
config was kind of easy is that config interrupt is rarely
> > > > > > > > > > > > > > vital for
device function so arbitrarily deferring that does not lead to
> > > > > > > > > > > > > > deadlocks
- what you are trying to do with VQ interrupts is
> > > > > > > > > > > > > >
fundamentally different. Things are especially bad if we just drop
> > > > > > > > > > > > > > an
interrupt but deferring can lead to problems too.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm not
sure I see the difference, disable_irq() stuffs also delay the
> > > > > > > > > > > > > interrupt
processing until enable_irq().
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Absolutely. I am not
at all sure disable_irq fixes all problems.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Consider
as an example
> > > > > > > > > > > > > >    
virtio-net: fix race between ndo_open() and virtio_device_ready()
> > > > > > > > > > > > > > if you
just defer vq interrupts you get deadlocks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't see
a deadlock here, maybe you can show more detail on this?
> > > > > > > > > > > >
> > > > > > > > > > > > What I mean is this:
if we revert the above commit, things still
> > > > > > > > > > > > work (out of spec,
but still). If we revert and defer interrupts until
> > > > > > > > > > > > device ready then
ndo_open that triggers before device ready deadlocks.
> > > > > > > > > > >
> > > > > > > > > > > Ok, I guess you meant on
a hypervisor that is strictly written with spec.
> > > > > > > > > >
> > > > > > > > > > I mean on hypervisor that
starts processing queues after getting a kick
> > > > > > > > > > even without DRIVER_OK.
> > > > > > > > >
> > > > > > > > > Oh right.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So,
thinking about all this, how about a simple per vq flag meaning
> > > > > > > > > > > > > > "this
vq was kicked since reset"?
> > > > > > > > > > > > >
> > > > > > > > > > > > > And ignore the
notification if vq is not kicked? It sounds like the
> > > > > > > > > > > > > callback needs
to be synchronized with the kick.
> > > > > > > > > > > >
> > > > > > > > > > > > Note we only need to
synchronize it when it changes, which is
> > > > > > > > > > > > only during
initialization and reset.
> > > > > > > > > > >
> > > > > > > > > > > Yes.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If driver
does not kick then it's not ready to get callbacks, right?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sounds
quite clean, but we need to think through memory ordering
> > > > > > > > > > > > > > concerns -
I guess it's only when we change the value so
> > > > > > > > > > > > > >         if
(!vq->kicked) {
> > > > > > > > > > > > > >           
vq->kicked = true;
> > > > > > > > > > > > > >           
mb();
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > will do
the trick, right?
> > > > > > > > > > > > >
> > > > > > > > > > > > > There's no
much difference with the existing approach:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) your
proposal implicitly makes callbacks ready in virtqueue_kick()
> > > > > > > > > > > > > 2) my proposal
explicitly makes callbacks ready via virtio_device_ready()
> > > > > > > > > > > > >
> > > > > > > > > > > > > Both require
careful auditing of all the existing drivers to make sure
> > > > > > > > > > > > > no kick before
DRIVER_OK.
> > > > > > > > > > > >
> > > > > > > > > > > > Jason, kick before
DRIVER_OK is out of spec, sure. But it is unrelated
> > > > > > > > > > > > to hardening
> > > > > > > > > > >
> > > > > > > > > > > Yes but with your
proposal, it seems to couple kick with DRIVER_OK somehow.
> > > > > > > > > >
> > > > > > > > > > I don't see how - my
proposal ignores DRIVER_OK issues.
> > > > > > > > >
> > > > > > > > > Yes, what I meant is, in your
proposal, the first kick after rest is a
> > > > > > > > > hint that the driver is ok (but
actually it could not).
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > and in absence of
config interrupts is generally easily
> > > > > > > > > > > > fixed just by
sticking virtio_device_ready early in initialization.
> > > > > > > > > > >
> > > > > > > > > > > So if the kick is done
before the subsystem registration, there's
> > > > > > > > > > > still a window in the
middle (assuming we stick virtio_device_ready()
> > > > > > > > > > > early):
> > > > > > > > > > >
> > > > > > > > > > > virtio_device_ready()
> > > > > > > > > > > virtqueue_kick()
> > > > > > > > > > > /* the window */
> > > > > > > > > > > subsystem_registration()
> > > > > > > > > >
> > > > > > > > > > Absolutely, however, I do not
think we really have many such drivers
> > > > > > > > > > since this has been known as a
wrong thing to do since the beginning.
> > > > > > > > > > Want to try to find any?
> > > > > > > > >
> > > > > > > > > Yes, let me try and update.
> > > > > > > >
> > > > > > > > This is basically the device that have
an RX queue, so I've found the
> > > > > > > > following drivers:
> > > > > > > >
> > > > > > > > scmi, mac80211_hwsim, vsock, bt,
balloon.
> > > > > > >
> > > > > > > Looked and I don't see it yet. Let's
consider
> > > > > > > ./net/vmw_vsock/virtio_transport.c for
example. Assuming we block
> > > > > > > callbacks until the first kick, what is the
issue with probe exactly?
> > > > > >
> > > > > > We need to make sure the callback can survive when
it runs before sub
> > > > > > system registration.
> > > > >
> > > > > With my proposal no - only if we also kick before
registration.
> > > > > So I do not see the issue yet.
> > > > >
> > > > > Consider ./net/vmw_vsock/virtio_transport.c
> > > > >
> > > > > kicks: virtio_transport_send_pkt_work,
> > > > > virtio_vsock_rx_fill, virtio_vsock_event_fill
> > > > >
> > > > > which of these triggers before we are ready to
> > > > > handle callbacks?
> > > >
> > > > So:
> > > >
> > > > virtio_vsock_vqs_init()
> > > >     virtio_device_ready()
> > > >     virtio_vsock_rx_fill() /* kick there */
> > > > rcu_assign_pointer(the_virtio_vsock, vsock)
> > > >
> > > > It means at least
virtio_vsock_rx_done()/virtio_vsock_workqueue needs
> > > > to survive. I don't say it has a bug but we do need to
audit the code
> > > > in this case. The implication is: the virtqueue callback
should be
> > > > written with no assumption that the driver has registered in
the
> > > > subsystem. We don't or can't assume all drivers are
written in this
> > > > way.
> > >
> > >
> > > I thought you said you audited code and found bugs.
> > >
> > > My claim is that simply because qemu starts processing
> > > packets immediately upon kick, if bugs like this
> > > existed we would have noticed by now.
> >
> > This is true for a well behaved hypervisor. But what we want to deal
> > with is the buggy/malicious hypervisors.
>
>
>
>
> > >
> > > In this case the_virtio_vsock is used for xmit things,
> > > callbacks do not seem to use it at all.
> >
> > So the hypervisor can trigger the notification just after the kick and
> > the work function seems to be safe.
> >
> > One another example for this is in virtcons_probe():
> >
> >         spin_lock_init(&portdev->ports_lock);
> >         INIT_LIST_HEAD(&portdev->ports);
> >         INIT_LIST_HEAD(&portdev->list);
> >
> >         virtio_device_ready(portdev->vdev);
> >
> >         INIT_WORK(&portdev->config_work,
&config_work_handler);
> >         INIT_WORK(&portdev->control_work,
&control_work_handler);
> >
> > in control_intr() we had:
> >
> > static void control_intr(struct virtqueue *vq)
> > {
> >         struct ports_device *portdev;
> >
> >         portdev = vq->vdev->priv;
> >         schedule_work(&portdev->control_work);
> > }
> >
> > So we might crash if the notification is raised just after
> > virtio_device_ready().
>
> Yes! But this is not my proposal. This is yours.
> Your patches block interrupts until virtio_device_ready.
>
> My proposal is to block them until kick.
>
> In this case kick is in fill_queue after INIT_WORK.
Yes.
>
>
> > This is not an exact example of when a callback is not ready after
> > kick, but it demonstrates that the callback could have assumed that
> > all setup has been done when it is called.
> >
> > Thanks
>
> So if there are not examples of callbacks not ready after kick
> then let us block callbacks until first kick. That is my idea.
Ok, let me try. I need to drain my queue of fixes first.

Thanks
>
>
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > >I couldn't ... except maybe
bluetooth
> > > > > > > > > > but that's just maintainer
nacking fixes saying he'll fix it
> > > > > > > > > > his way ...
> > > > > > > > > >
> > > > > > > > > > > And during remove(), we
get another window:
> > > > > > > > > > >
> > > > > > > > > > >
subsysrem_unregistration()
> > > > > > > > > > > /* the window */
> > > > > > > > > > > virtio_device_reset()
> > > > > > > > > >
> > > > > > > > > > Same here.
> > > > > > > >
> > > > > > > > Basically for the drivers that set
driver_ok before registration,
> > > > > > >
> > > > > > > I don't see what does driver_ok have to
do with it.
> > > > > >
> > > > > > I meant for those driver, in probe they do()
> > > > > >
> > > > > > virtio_device_ready()
> > > > > > subsystem_register()
> > > > > >
> > > > > > In remove() they do
> > > > > >
> > > > > > subsystem_unregister()
> > > > > > virtio_device_reset()
> > > > > >
> > > > > > for symmetry
> > > > >
> > > > > Let's leave remove alone for now. I am close to
100% sure we have *lots*
> > > > > of issues around it, but while probe is unavoidable
remove can be
> > > > > avoided by blocking hotplug.
> > > >
> > > > Unbind can trigger this path as well.
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > so
> > > > > > > > we have a lot:
> > > > > > > >
> > > > > > > > blk, net, mac80211_hwsim, scsi, vsock,
bt, crypto, gpio, gpu, i2c,
> > > > > > > > iommu, caif, pmem, input, mem
> > > > > > > >
> > > > > > > > So I think there's no easy way to
harden the notification without
> > > > > > > > auditing the driver one by one
(especially considering the driver may
> > > > > > > > use bh or workqueue). The problem is the
notification hardening
> > > > > > > > depends on a correct or race-free
probe/remove. So we need to fix the
> > > > > > > > issues in probe/remove then do the
hardening on the notification.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > So if drivers kick but are not ready to get
callbacks then let's fix
> > > > > > > that first of all, these are racy with
existing qemu even ignoring
> > > > > > > spec compliance.
> > > > > >
> > > > > > Yes, (the patches I've posted so far exist
even with a well-behaved device).
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > patches you posted deal with DRIVER_OK spec compliance.
> > > > > I do not see patches for kicks before callbacks are
ready to run.
> > > >
> > > > Yes.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > MST
> > > > > > >
> > > > >
> > >
>

Virtualization - Jul 2022 - [PATCH V3] virtio: disable notification hardening by default

[PATCH V3] virtio: disable notification hardening by default

[PATCH V3] virtio: disable notification hardening by default

[PATCH V3] virtio: disable notification hardening by default