On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:> On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst at
redhat.com> wrote:
> >
> > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan at
prestigetransportation.com> wrote:
> > > >
> > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan at
prestigetransportation.com> wrote:
> > > > >
> > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan <ivan at
prestigetransportation.com> wrote:
> > > > >>
> > > > >> On Fri, Jul 23, 2021 at 7:10 AM Michael S. Tsirkin
<mst at redhat.com> wrote:
> > > > >>>
> > > > >>> On Fri, Jul 23, 2021 at 03:06:04AM -0500, Ivan
wrote:
> > > > >>> > On Fri, Jul 23, 2021 at 2:59 AM Michael S.
Tsirkin <mst at redhat.com> wrote:
> > > > >>> > >
> > > > >>> > > On Thu, Jul 22, 2021 at 11:50:11PM
-0500, Ivan wrote:
> > > > >>> > > > On Thu, Jul 22, 2021 at 11:25 PM
Jason Wang <jasowang at redhat.com> wrote:
> > > > >>> > > > > ? 2021/7/23 ??10:54, Ivan
??:
> > > > >>> > > > > > On Thu, Jul 22, 2021
at 9:37 PM Jason Wang <jasowang at redhat.com> wrote:
> > > > >>> > > > > >> Does it work if
you turn off lro before enabling the forwarding?
> > > > >>> > > > > > 0 root at NuRaid:~#
ethtool -K eth0 lro off
> > > > >>> > > > > > Actual changes:
> > > > >>> > > > > > rx-lro: on [requested
off]
> > > > >>> > > > > > Could not change any
device features
> > > > >>> > > > >
> > > > >>> > > > > Ok, it looks like the
device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > >>> > > > > which makes it impossible
to change the LRO setting.
> > > > >>> > > > >
> > > > >>> > > > > Did you use qemu? If yes,
what's the qemu version you've used?
> > > > >>> > > >
> > > > >>> > > > These are VirtualBox machines,
which I've been using for years with
> > > > >>> > > > longterm kernels 4.19, and I
never had such a problem. But now that I
> > > > >>> > > > tried upgrading to kernels 5.10
or 5.13 -- the panics started. These
> > > > >>> > > > are just generic kernel builds,
and a minimalistic userspace.
> > > > >>> > >
> > > > >>> > > I would be useful to see the features
your virtualbox instance provides
> > > > >>> > >
> > > > >>> > > cat
/sys/class/net/eth0/device/features
> > > > >>> >
> > > > >>> > # cat /sys/class/net/eth0/device/features
> > > > >>> >
1100010110111011111100000000000000000000000000000000000000000000
> > > > >>>
> > > > >>> I was able to reproduce the warning but not the
panic.
> > > > >>> OTOH if LRO stays on when enabling forwarding
that
> > > > >>> is already a problem. Any chance you can bisect
to
> > > > >>> find out which change introduced the panic?
> > > > >>
> > > > >>
> > > > >> Any kernels up to 4.19.198 don't panic.
> > > > >> Any kernels 5.10+ panic immediately upon starting
forwarding.
> > > > >> I have not tested any kernels between 4.19 and
5.10.
> > > > >> I guess I can build a few kernels inbetween, and
try pinpoint where it starts.
> > > > >> That may take a day or so. I'll get on with it
now, and report my findings.
> > > > >
> > > > > So, I narrowed it down: the panics start with kernel
5.0-rc.
> > > >
> > > > More narowly, the problem seems be coming from commit
> > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > Just to test my suspicion, I deleted a few lines from that
code,
> > > > and the panic went away. Hope that helps you guys figure
out
> > > > what the problem might be.
> >
> > Well it disables LRO but we knew this :( I'd help if we knew
> > where does it panic, all we see it the warning which is
> > related for sure but not the immediate rootcause ...
> >
> > > >
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2978,11 +2978,6 @@
> > > > }
> > > > if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> > > > dev->features |= NETIF_F_RXCSUM;
> > > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > - virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6))
> > > > - dev->features |= NETIF_F_LRO;
> > > > - if (virtio_has_feature(vdev,
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > - dev->hw_features |= NETIF_F_LRO;
> > > >
> > > > dev->vlan_features = dev->features;
> > >
> > > Just FYI, Google turned up two similar bug reposts...
> > > Apr 14, 2020 -- https://github.com/containers/podman/issues/5815
> > > Oct 09. 2020 --
https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > >
> > > Is there any sensible thing I could do, temporarily, until this
> > > problem is sorted out?
> > > Or am I simply stuck to kernels 4.19 on these machines for now?
> >
> >
> > Something like this I guess:
> >
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a58a2f013af..cc5982193a40 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3063,6 +3063,8 @@ static int virtnet_validate(struct virtio_device
*vdev)
> > __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> > }
> >
> > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO4);
> > + __virtio_clear_bit(vdev, VIRTIO_NET_F_GUEST_TSO6);
> > return 0;
> > }
>
> When I apply your patch, then I see drastic (more than half)
> reductions in speed. (confirmed with iperf).
>
> But if instead I just remove a few lines from commit
> a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> as in my earlier post, then I'm back to full speed
>
> I understand that this is just temporary workaround, until we figure this
out.
Oh weird. So it's not about getting some weird LRO packet. We will get it
with
VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being set in
features.
How about this then? Just pretend to Linux that we disabled LRO.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a58a2f013af..8e7e4cea176b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct net_device *dev,
~GUEST_OFFLOAD_LRO_MASK;
err = virtnet_set_guest_offloads(vi, offloads);
- if (err)
- return err;
+ WARN_ON(err);
+ //if (err)
+ // return err;
vi->guest_offloads = offloads;
}
--
MST