On Wed, Nov 24, 2021 at 10:26 AM Jason Wang <jasowang at redhat.com> wrote:> > On Wed, Nov 24, 2021 at 9:30 AM Michael Ellerman <mpe at ellerman.id.au> wrote: > > > > "Michael S. Tsirkin" <mst at redhat.com> writes: > > > On Tue, Nov 23, 2021 at 10:25:20AM +0800, Jason Wang wrote: > > >> On Tue, Nov 23, 2021 at 4:24 AM Halil Pasic <pasic at linux.ibm.com> wrote: > > >> > > > >> > On Mon, 22 Nov 2021 14:25:26 +0800 > > >> > Jason Wang <jasowang at redhat.com> wrote: > > >> > > > >> > > I think the fixes are: > > >> > > > > >> > > 1) fixing the vhost vsock > > >> > > 2) use suppress_used_validation=true to let vsock driver to validate > > >> > > the in buffer length > > >> > > 3) probably a new feature so the driver can only enable the validation > > >> > > when the feature is enabled. > > >> > > > >> > I'm not sure, I would consider a F_DEV_Y_FIXED_BUG_X a perfectly good > > >> > feature. Frankly the set of such bugs is device implementation > > >> > specific and it makes little sense to specify a feature bit > > >> > that says the device implementation claims to adhere to some > > >> > aspect of the specification. Also what would be the semantic > > >> > of not negotiating F_DEV_Y_FIXED_BUG_X? > > >> > > >> Yes, I agree. Rethink of the feature bit, it seems unnecessary, > > >> especially considering the driver should not care about the used > > >> length for tx. > > >> > > >> > > > >> > On the other hand I see no other way to keep the validation > > >> > permanently enabled for fixed implementations, and get around the problem > > >> > with broken implementations. So we could have something like > > >> > VHOST_USED_LEN_STRICT. > > >> > > >> It's more about a choice of the driver's knowledge. For vsock TX it > > >> should be fine. If we introduce a parameter and disable it by default, > > >> it won't be very useful. > > >> > > >> > > > >> > Maybe, we can also think of 'warn and don't alter behavior' instead of > > >> > 'warn' and alter behavior. Or maybe even not having such checks on in > > >> > production, but only when testing. > > >> > > >> I think there's an agreement that virtio drivers need more hardening, > > >> that's why a lot of patches were merged. Especially considering the > > >> new requirements came from confidential computing, smart NIC and > > >> VDUSE. For virtio drivers, enabling the validation may help to > > >> > > >> 1) protect the driver from the buggy and malicious device > > >> 2) uncover the bugs of the devices (as vsock did, and probably rpmsg) > > >> 3) force the have a smart driver that can do the validation itself > > >> then we can finally remove the validation in the core > > >> > > >> So I'd like to keep it enabled. > > >> > > >> Thanks > > > > > > Let's see how far we can get. But yes, maybe we were too aggressive in > > > breaking things by default, a warning might be a better choice for a > > > couple of cycles. > > Ok, considering we saw the issues with balloons I think I can post a > patch to use warn instead. I wonder if we need to taint the kernel in > this case.Rethink this, consider we still have some time, I tend to convert the drivers to validate the length by themselves. Does this make sense? Thanks> > > > > This series appears to break the virtio_balloon driver as well. > > > > The symptom is soft lockup warnings, eg: > > > > INFO: task kworker/1:1:109 blocked for more than 614 seconds. > > Not tainted 5.16.0-rc2-gcc-10.3.0 #21 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > task:kworker/1:1 state:D stack:12496 pid: 109 ppid: 2 flags:0x00000800 > > Workqueue: events_freezable update_balloon_size_func > > Call Trace: > > [c000000003cef7c0] [c000000003cef820] 0xc000000003cef820 (unreliable) > > [c000000003cef9b0] [c00000000001e238] __switch_to+0x1e8/0x2f0 > > [c000000003cefa10] [c000000000f0a00c] __schedule+0x2cc/0xb50 > > [c000000003cefae0] [c000000000f0a8fc] schedule+0x6c/0x140 > > [c000000003cefb10] [c00000000095b6c4] tell_host+0xe4/0x130 > > [c000000003cefba0] [c00000000095d234] update_balloon_size_func+0x394/0x3f0 > > [c000000003cefc70] [c000000000178064] process_one_work+0x2c4/0x5b0 > > [c000000003cefd10] [c0000000001783f8] worker_thread+0xa8/0x640 > > [c000000003cefda0] [c000000000185444] kthread+0x1b4/0x1c0 > > [c000000003cefe10] [c00000000000cee4] ret_from_kernel_thread+0x5c/0x64 > > > > Similar backtrace reported here by Luis: > > > > https://lore.kernel.org/lkml/YY2duTi0wAyAKUTJ at bombadil.infradead.org/ > > > > Bisect points to: > > > > # first bad commit: [939779f5152d161b34f612af29e7dc1ac4472fcf] virtio_ring: validate used buffer length > > > > Adding suppress used validation to the virtio balloon driver "fixes" it, eg. > > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > > index c22ff0117b46..a14b82ceebb2 100644 > > --- a/drivers/virtio/virtio_balloon.c > > +++ b/drivers/virtio/virtio_balloon.c > > @@ -1150,6 +1150,7 @@ static unsigned int features[] = { > > }; > > > > static struct virtio_driver virtio_balloon_driver = { > > + .suppress_used_validation = true, > > .feature_table = features, > > .feature_table_size = ARRAY_SIZE(features), > > .driver.name = KBUILD_MODNAME, > > Looks good, we need a formal patch for this. > > And we need fix Qemu as well which advertise non zero used length for > inflate/deflate queue: > > static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) > ... > virtqueue_push(vq, elem, offset); > > Thanks > > > > > > > cheers > >
Michael S. Tsirkin
2021-Nov-24 07:22 UTC
[PATCH V5 1/4] virtio_ring: validate used buffer length
On Wed, Nov 24, 2021 at 10:33:28AM +0800, Jason Wang wrote:> On Wed, Nov 24, 2021 at 10:26 AM Jason Wang <jasowang at redhat.com> wrote: > > > > On Wed, Nov 24, 2021 at 9:30 AM Michael Ellerman <mpe at ellerman.id.au> wrote: > > > > > > "Michael S. Tsirkin" <mst at redhat.com> writes: > > > > On Tue, Nov 23, 2021 at 10:25:20AM +0800, Jason Wang wrote: > > > >> On Tue, Nov 23, 2021 at 4:24 AM Halil Pasic <pasic at linux.ibm.com> wrote: > > > >> > > > > >> > On Mon, 22 Nov 2021 14:25:26 +0800 > > > >> > Jason Wang <jasowang at redhat.com> wrote: > > > >> > > > > >> > > I think the fixes are: > > > >> > > > > > >> > > 1) fixing the vhost vsock > > > >> > > 2) use suppress_used_validation=true to let vsock driver to validate > > > >> > > the in buffer length > > > >> > > 3) probably a new feature so the driver can only enable the validation > > > >> > > when the feature is enabled. > > > >> > > > > >> > I'm not sure, I would consider a F_DEV_Y_FIXED_BUG_X a perfectly good > > > >> > feature. Frankly the set of such bugs is device implementation > > > >> > specific and it makes little sense to specify a feature bit > > > >> > that says the device implementation claims to adhere to some > > > >> > aspect of the specification. Also what would be the semantic > > > >> > of not negotiating F_DEV_Y_FIXED_BUG_X? > > > >> > > > >> Yes, I agree. Rethink of the feature bit, it seems unnecessary, > > > >> especially considering the driver should not care about the used > > > >> length for tx. > > > >> > > > >> > > > > >> > On the other hand I see no other way to keep the validation > > > >> > permanently enabled for fixed implementations, and get around the problem > > > >> > with broken implementations. So we could have something like > > > >> > VHOST_USED_LEN_STRICT. > > > >> > > > >> It's more about a choice of the driver's knowledge. For vsock TX it > > > >> should be fine. If we introduce a parameter and disable it by default, > > > >> it won't be very useful. > > > >> > > > >> > > > > >> > Maybe, we can also think of 'warn and don't alter behavior' instead of > > > >> > 'warn' and alter behavior. Or maybe even not having such checks on in > > > >> > production, but only when testing. > > > >> > > > >> I think there's an agreement that virtio drivers need more hardening, > > > >> that's why a lot of patches were merged. Especially considering the > > > >> new requirements came from confidential computing, smart NIC and > > > >> VDUSE. For virtio drivers, enabling the validation may help to > > > >> > > > >> 1) protect the driver from the buggy and malicious device > > > >> 2) uncover the bugs of the devices (as vsock did, and probably rpmsg) > > > >> 3) force the have a smart driver that can do the validation itself > > > >> then we can finally remove the validation in the core > > > >> > > > >> So I'd like to keep it enabled. > > > >> > > > >> Thanks > > > > > > > > Let's see how far we can get. But yes, maybe we were too aggressive in > > > > breaking things by default, a warning might be a better choice for a > > > > couple of cycles. > > > > Ok, considering we saw the issues with balloons I think I can post a > > patch to use warn instead. I wonder if we need to taint the kernel in > > this case. > > Rethink this, consider we still have some time, I tend to convert the > drivers to validate the length by themselves. Does this make sense? > > ThanksThat's separate but let's stop crashing guests for people ASAP.> > > > > > > > This series appears to break the virtio_balloon driver as well. > > > > > > The symptom is soft lockup warnings, eg: > > > > > > INFO: task kworker/1:1:109 blocked for more than 614 seconds. > > > Not tainted 5.16.0-rc2-gcc-10.3.0 #21 > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > task:kworker/1:1 state:D stack:12496 pid: 109 ppid: 2 flags:0x00000800 > > > Workqueue: events_freezable update_balloon_size_func > > > Call Trace: > > > [c000000003cef7c0] [c000000003cef820] 0xc000000003cef820 (unreliable) > > > [c000000003cef9b0] [c00000000001e238] __switch_to+0x1e8/0x2f0 > > > [c000000003cefa10] [c000000000f0a00c] __schedule+0x2cc/0xb50 > > > [c000000003cefae0] [c000000000f0a8fc] schedule+0x6c/0x140 > > > [c000000003cefb10] [c00000000095b6c4] tell_host+0xe4/0x130 > > > [c000000003cefba0] [c00000000095d234] update_balloon_size_func+0x394/0x3f0 > > > [c000000003cefc70] [c000000000178064] process_one_work+0x2c4/0x5b0 > > > [c000000003cefd10] [c0000000001783f8] worker_thread+0xa8/0x640 > > > [c000000003cefda0] [c000000000185444] kthread+0x1b4/0x1c0 > > > [c000000003cefe10] [c00000000000cee4] ret_from_kernel_thread+0x5c/0x64 > > > > > > Similar backtrace reported here by Luis: > > > > > > https://lore.kernel.org/lkml/YY2duTi0wAyAKUTJ at bombadil.infradead.org/ > > > > > > Bisect points to: > > > > > > # first bad commit: [939779f5152d161b34f612af29e7dc1ac4472fcf] virtio_ring: validate used buffer length > > > > > > Adding suppress used validation to the virtio balloon driver "fixes" it, eg. > > > > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > > > index c22ff0117b46..a14b82ceebb2 100644 > > > --- a/drivers/virtio/virtio_balloon.c > > > +++ b/drivers/virtio/virtio_balloon.c > > > @@ -1150,6 +1150,7 @@ static unsigned int features[] = { > > > }; > > > > > > static struct virtio_driver virtio_balloon_driver = { > > > + .suppress_used_validation = true, > > > .feature_table = features, > > > .feature_table_size = ARRAY_SIZE(features), > > > .driver.name = KBUILD_MODNAME, > > > > Looks good, we need a formal patch for this. > > > > And we need fix Qemu as well which advertise non zero used length for > > inflate/deflate queue: > > > > static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) > > ... > > virtqueue_push(vq, elem, offset); > > > > Thanks > > > > > > > > > > > cheers > > >
On Wed, 24 Nov 2021 10:33:28 +0800 Jason Wang <jasowang at redhat.com> wrote:> > > > Let's see how far we can get. But yes, maybe we were too aggressive in > > > > breaking things by default, a warning might be a better choice for a > > > > couple of cycles. > > > > Ok, considering we saw the issues with balloons I think I can post a > > patch to use warn instead. I wonder if we need to taint the kernel in > > this case. > > Rethink this, consider we still have some time, I tend to convert the > drivers to validate the length by themselves. Does this make sense?I do find value in doing the validation in a single place for every driver. This is really a common concern. But I think, not breaking what used to work before is a good idea. So I would opt for producing a warning, but otherwise preserving old behavior unless there is an explicit opt-in for something more strict. BTW AFAIU if we detect a problem here, there are basically two cases: (1) Either the device is over-reporting what it has written, or (2) we have a memory corruption in the guest because the device has written beyond the end of the provided buffer. This can be because (2.1) the driver provided a smaller buffer than mandated by the spec, or (2.2) the device is broken. Case (1) is relatively harmless, and I believe a warning for it is more than appropriate. Whoever sees the warning should push for a fixed device if possible. Case (2) is nasty. What would be the sanest course of action if we were reasonably sure we have have case (2.2)? Maybe we can detect case (2) with a canary. I.e. artificially extend the buffer with an extra descriptor that has a poisoned buffer, and check if the value of that poisoned buffer is different than poison. I'm not sure it is worth the effort though in production. Regards, Halil