Michael S. Tsirkin
2022-Aug-15 20:42 UTC
[PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"
On Mon, Aug 15, 2022 at 01:34:26PM -0700, Guenter Roeck wrote:> On Mon, Aug 15, 2022 at 05:16:50AM -0400, Michael S. Tsirkin wrote: > > This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7. > > > > This has been reported to trip up guests on GCP (Google Cloud). Why is > > not yet clear - to be debugged, but the patch itself has several other > > issues: > > > > - It treats unknown speed as < 10G > > - It leaves userspace no way to find out the ring size set by hypervisor > > - It tests speed when link is down > > - It ignores the virtio spec advice: > > Both \field{speed} and \field{duplex} can change, thus the driver > > is expected to re-read these values after receiving a > > configuration change notification. > > - It is not clear the performance impact has been tested properly > > > > Revert the patch for now. > > > > Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net > > Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de > > Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com > > Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de > > Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()") > > Cc: Xuan Zhuo <xuanzhuo at linux.alibaba.com> > > Cc: Jason Wang <jasowang at redhat.com> > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > > Tested-by: Andres Freund <andres at anarazel.de> > > I ran this patch through a total of 14 syskaller tests, 2 test runs each on > 7 different crashes reported by syzkaller (as reported to the linux-kernel > mailing list). No problems were reported. I also ran a single cross-check > with one of the syzkaller runs on top of v6.0-rc1, without this patch. > That test run failed. > > Overall, I think we can call this fixed. > > GuenterIt's more of a work around though since we don't yet have the root cause for this. I suspect a GCP hypervisor bug at the moment. This is excercising a path we previously only took on GFP_KERNEL allocation failures during probe, I don't think that happens a lot. -- MST
Guenter Roeck
2022-Aug-15 20:50 UTC
[PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"
On Mon, Aug 15, 2022 at 04:42:51PM -0400, Michael S. Tsirkin wrote:> On Mon, Aug 15, 2022 at 01:34:26PM -0700, Guenter Roeck wrote: > > On Mon, Aug 15, 2022 at 05:16:50AM -0400, Michael S. Tsirkin wrote: > > > This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7. > > > > > > This has been reported to trip up guests on GCP (Google Cloud). Why is > > > not yet clear - to be debugged, but the patch itself has several other > > > issues: > > > > > > - It treats unknown speed as < 10G > > > - It leaves userspace no way to find out the ring size set by hypervisor > > > - It tests speed when link is down > > > - It ignores the virtio spec advice: > > > Both \field{speed} and \field{duplex} can change, thus the driver > > > is expected to re-read these values after receiving a > > > configuration change notification. > > > - It is not clear the performance impact has been tested properly > > > > > > Revert the patch for now. > > > > > > Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net > > > Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de > > > Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com > > > Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de > > > Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()") > > > Cc: Xuan Zhuo <xuanzhuo at linux.alibaba.com> > > > Cc: Jason Wang <jasowang at redhat.com> > > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > > > Tested-by: Andres Freund <andres at anarazel.de> > > > > I ran this patch through a total of 14 syskaller tests, 2 test runs each on > > 7 different crashes reported by syzkaller (as reported to the linux-kernel > > mailing list). No problems were reported. I also ran a single cross-check > > with one of the syzkaller runs on top of v6.0-rc1, without this patch. > > That test run failed. > > > > Overall, I think we can call this fixed. > > > > Guenter > > It's more of a work around though since we don't yet have the root > cause for this. I suspect a GCP hypervisor bug at the moment. > This is excercising a path we previously only took on GFP_KERNEL > allocation failures during probe, I don't think that happens a lot. >Even a hypervisor bug should not trigger crashes like this one, though, or at least I think so. Any idea what to look for on the hypervisor side, and/or what it might be doing wrong ? Thanks, Guenter