thr3ads.net - Virtualization - [PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2022-Aug-15 20:42 UTC

[PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"

On Mon, Aug 15, 2022 at 01:34:26PM -0700, Guenter Roeck
wrote:> On Mon, Aug 15, 2022 at 05:16:50AM -0400, Michael S. Tsirkin wrote:
> > This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.
> > 
> > This has been reported to trip up guests on GCP (Google Cloud).  Why
is
> > not yet clear - to be debugged, but the patch itself has several other
> > issues:
> > 
> > - It treats unknown speed as < 10G
> > - It leaves userspace no way to find out the ring size set by
hypervisor
> > - It tests speed when link is down
> > - It ignores the virtio spec advice:
> >         Both \field{speed} and \field{duplex} can change, thus the
driver
> >         is expected to re-read these values after receiving a
> >         configuration change notification.
> > - It is not clear the performance impact has been tested properly
> > 
> > Revert the patch for now.
> > 
> > Link:
https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net
> > Link:
https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de
> > Link:
https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com
> > Link:
https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de
> > Fixes: 762faee5a267 ("virtio_net: set the default max ring size
by find_vqs()")
> > Cc: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
> > Cc: Jason Wang <jasowang at redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
> > Tested-by: Andres Freund <andres at anarazel.de>
> 
> I ran this patch through a total of 14 syskaller tests, 2 test runs each on
> 7 different crashes reported by syzkaller (as reported to the linux-kernel
> mailing list). No problems were reported. I also ran a single cross-check
> with one of the syzkaller runs on top of v6.0-rc1, without this patch.
> That test run failed.
> 
> Overall, I think we can call this fixed.
> 
> Guenter
It's more of a work around though since we don't yet have the root
cause for this. I suspect a GCP hypervisor bug at the moment.
This is excercising a path we previously only took on GFP_KERNEL
allocation failures during probe, I don't think that happens a lot.

-- 
MST

Guenter Roeck

2022-Aug-15 20:50 UTC

head link

[PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"

On Mon, Aug 15, 2022 at 04:42:51PM -0400, Michael S. Tsirkin
wrote:> On Mon, Aug 15, 2022 at 01:34:26PM -0700, Guenter Roeck wrote:
> > On Mon, Aug 15, 2022 at 05:16:50AM -0400, Michael S. Tsirkin wrote:
> > > This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.
> > > 
> > > This has been reported to trip up guests on GCP (Google Cloud). 
Why is
> > > not yet clear - to be debugged, but the patch itself has several
other
> > > issues:
> > > 
> > > - It treats unknown speed as < 10G
> > > - It leaves userspace no way to find out the ring size set by
hypervisor
> > > - It tests speed when link is down
> > > - It ignores the virtio spec advice:
> > >         Both \field{speed} and \field{duplex} can change, thus
the driver
> > >         is expected to re-read these values after receiving a
> > >         configuration change notification.
> > > - It is not clear the performance impact has been tested properly
> > > 
> > > Revert the patch for now.
> > > 
> > > Link:
https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net
> > > Link:
https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de
> > > Link:
https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com
> > > Link:
https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de
> > > Fixes: 762faee5a267 ("virtio_net: set the default max ring
size by find_vqs()")
> > > Cc: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
> > > Cc: Jason Wang <jasowang at redhat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
> > > Tested-by: Andres Freund <andres at anarazel.de>
> > 
> > I ran this patch through a total of 14 syskaller tests, 2 test runs
each on
> > 7 different crashes reported by syzkaller (as reported to the
linux-kernel
> > mailing list). No problems were reported. I also ran a single
cross-check
> > with one of the syzkaller runs on top of v6.0-rc1, without this patch.
> > That test run failed.
> > 
> > Overall, I think we can call this fixed.
> > 
> > Guenter
> 
> It's more of a work around though since we don't yet have the root
> cause for this. I suspect a GCP hypervisor bug at the moment.
> This is excercising a path we previously only took on GFP_KERNEL
> allocation failures during probe, I don't think that happens a lot.
>
Even a hypervisor bug should not trigger crashes like this one,
though, or at least I think so. Any idea what to look for on the
hypervisor side, and/or what it might be doing wrong ?

Thanks,
Guenter

Virtualization - Aug 2022 - [PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"

[PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"

[PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"