On Tue, Aug 10, 2021 at 11:31 PM Michael S. Tsirkin <mst at redhat.com>
wrote:>
> On Mon, Aug 02, 2021 at 04:23:12PM -0500, Ivan wrote:
> > On Mon, Aug 2, 2021 at 2:52 PM Michael S. Tsirkin <mst at
redhat.com> wrote:
> > >
> > > On Mon, Aug 02, 2021 at 01:32:05PM -0500, Ivan wrote:
> > > > On Tue, Jul 27, 2021 at 4:11 AM Michael S. Tsirkin <mst
at redhat.com> wrote:
> > > > >
> > > > > On Mon, Jul 26, 2021 at 07:44:43PM -0500, Ivan wrote:
> > > > > > On Sat, Jul 24, 2021 at 11:18 PM Ivan <ivan at
prestigetransportation.com> wrote:
> > > > > > >
> > > > > > > On Sat, Jul 24, 2021 at 7:17 PM Ivan <ivan
at prestigetransportation.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 23, 2021 at 7:33 AM Ivan
<ivan at prestigetransportation.com> wrote:
> > > > > > > >>
> > > > > > > >> On Fri, Jul 23, 2021 at 7:10 AM
Michael S. Tsirkin <mst at redhat.com> wrote:
> > > > > > > >>>
> > > > > > > >>> On Fri, Jul 23, 2021 at
03:06:04AM -0500, Ivan wrote:
> > > > > > > >>> > On Fri, Jul 23, 2021 at
2:59 AM Michael S. Tsirkin <mst at redhat.com> wrote:
> > > > > > > >>> > >
> > > > > > > >>> > > On Thu, Jul 22, 2021
at 11:50:11PM -0500, Ivan wrote:
> > > > > > > >>> > > > On Thu, Jul 22,
2021 at 11:25 PM Jason Wang <jasowang at redhat.com> wrote:
> > > > > > > >>> > > > > ? 2021/7/23
??10:54, Ivan ??:
> > > > > > > >>> > > > > > On Thu,
Jul 22, 2021 at 9:37 PM Jason Wang <jasowang at redhat.com> wrote:
> > > > > > > >>> > > > > >>
Does it work if you turn off lro before enabling the forwarding?
> > > > > > > >>> > > > > > 0 root
at NuRaid:~# ethtool -K eth0 lro off
> > > > > > > >>> > > > > > Actual
changes:
> > > > > > > >>> > > > > > rx-lro:
on [requested off]
> > > > > > > >>> > > > > > Could
not change any device features
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Ok, it looks
like the device misses the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> > > > > > > >>> > > > > which makes
it impossible to change the LRO setting.
> > > > > > > >>> > > > >
> > > > > > > >>> > > > > Did you use
qemu? If yes, what's the qemu version you've used?
> > > > > > > >>> > > >
> > > > > > > >>> > > > These are
VirtualBox machines, which I've been using for years with
> > > > > > > >>> > > > longterm kernels
4.19, and I never had such a problem. But now that I
> > > > > > > >>> > > > tried upgrading
to kernels 5.10 or 5.13 -- the panics started. These
> > > > > > > >>> > > > are just generic
kernel builds, and a minimalistic userspace.
> > > > > > > >>> > >
> > > > > > > >>> > > I would be useful to
see the features your virtualbox instance provides
> > > > > > > >>> > >
> > > > > > > >>> > > cat
/sys/class/net/eth0/device/features
> > > > > > > >>> >
> > > > > > > >>> > # cat
/sys/class/net/eth0/device/features
> > > > > > > >>> >
1100010110111011111100000000000000000000000000000000000000000000
> > > > > > > >>>
> > > > > > > >>> I was able to reproduce the
warning but not the panic.
> > > > > > > >>> OTOH if LRO stays on when
enabling forwarding that
> > > > > > > >>> is already a problem. Any chance
you can bisect to
> > > > > > > >>> find out which change introduced
the panic?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Any kernels up to 4.19.198 don't
panic.
> > > > > > > >> Any kernels 5.10+ panic immediately
upon starting forwarding.
> > > > > > > >> I have not tested any kernels
between 4.19 and 5.10.
> > > > > > > >> I guess I can build a few kernels
inbetween, and try pinpoint where it starts.
> > > > > > > >> That may take a day or so. I'll
get on with it now, and report my findings.
> > > > > > > >
> > > > > > > > So, I narrowed it down: the panics
start with kernel 5.0-rc.
> > > > > > >
> > > > > > > More narowly, the problem seems be coming
from commit
> > > > > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56.
> > > > > > > Just to test my suspicion, I deleted a few
lines from that code,
> > > > > > > and the panic went away. Hope that helps you
guys figure out
> > > > > > > what the problem might be.
> > > > >
> > > > > Well it disables LRO but we knew this :( I'd help
if we knew
> > > > > where does it panic, all we see it the warning which is
> > > > > related for sure but not the immediate rootcause ...
> > > > >
> > > > > > >
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -2978,11 +2978,6 @@
> > > > > > > }
> > > > > > > if (virtio_has_feature(vdev,
VIRTIO_NET_F_GUEST_CSUM))
> > > > > > > dev->features |= NETIF_F_RXCSUM;
> > > > > > > - if (virtio_has_feature(vdev,
VIRTIO_NET_F_GUEST_TSO4) ||
> > > > > > > - virtio_has_feature(vdev,
VIRTIO_NET_F_GUEST_TSO6))
> > > > > > > - dev->features |= NETIF_F_LRO;
> > > > > > > - if (virtio_has_feature(vdev,
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > > > > > > - dev->hw_features |= NETIF_F_LRO;
> > > > > > >
> > > > > > > dev->vlan_features = dev->features;
> > > > > >
> > > > > > Just FYI, Google turned up two similar bug
reposts...
> > > > > > Apr 14, 2020 --
https://github.com/containers/podman/issues/5815
> > > > > > Oct 09. 2020 --
https://bugzilla.kernel.org/show_bug.cgi?id=209593
> > > > > >
> > > > > > Is there any sensible thing I could do,
temporarily, until this
> > > > > > problem is sorted out?
> > > > > > Or am I simply stuck to kernels 4.19 on these
machines for now?
> > > > >
> > > > >
> > > > > Something like this I guess:
> > > > >
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c
b/drivers/net/virtio_net.c
> > > > > index 8a58a2f013af..cc5982193a40 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -3063,6 +3063,8 @@ static int
virtnet_validate(struct virtio_device *vdev)
> > > > > __virtio_clear_bit(vdev,
VIRTIO_NET_F_MTU);
> > > > > }
> > > > >
> > > > > + __virtio_clear_bit(vdev,
VIRTIO_NET_F_GUEST_TSO4);
> > > > > + __virtio_clear_bit(vdev,
VIRTIO_NET_F_GUEST_TSO6);
> > > > > return 0;
> > > > > }
> > > >
> > > > When I apply your patch, then I see drastic (more than half)
> > > > reductions in speed. (confirmed with iperf).
> > > >
> > > > But if instead I just remove a few lines from commit
> > > > a02e8964eaf9271a8a5fcc0c55bd13f933bafc56
> > > > as in my earlier post, then I'm back to full speed
> > > >
> > > > I understand that this is just temporary workaround, until
we figure this out.
> > >
> > >
> > > Oh weird. So it's not about getting some weird LRO packet. We
will get it with
> > > VIRTIO_NET_F_GUEST_TSO4 anyway. It's about the LRO flag being
set in
> > > features.
> > >
> > > How about this then? Just pretend to Linux that we disabled LRO.
> > >
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 8a58a2f013af..8e7e4cea176b 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -2651,8 +2651,9 @@ static int virtnet_set_features(struct
net_device *dev,
> > > ~GUEST_OFFLOAD_LRO_MASK;
> > >
> > > err = virtnet_set_guest_offloads(vi, offloads);
> > > - if (err)
> > > - return err;
> > > + WARN_ON(err);
> > > + //if (err)
> > > + // return err;
> > > vi->guest_offloads = offloads;
> > > }
> >
> > No. With this applied, the problem persists:
> >
> > # echo "1" > /proc/sys/net/ipv4/ip_forward
> >
> > kernel: ------------[ cut here ]------------
> > kernel: netdevice: eth0: failed to disable LRO!
> > kernel: WARNING: CPU: 0 PID: 452 at net/core/dev.c:1768
> > dev_disable_lro+0x108/0x150
> > kernel: Modules linked in: sg nls_iso8859_1 nls_cp437 vfat fat
> > hid_generic usbhid hid virtio_net net_failover failover aesni_intel
> > libaes crypto_simd ohci_pci ahci libahci cryptd rapl ehci_pci ohci_hcd
> > ehci_hcd usbcore usb_common libata evdev lpc_ich mfd_core rng_core
> > i2c_piix4 i2c_core virtio_pci virtio_pci_modern_dev virtio_ring virtio
> > rtc_cmos atkbd libps2 i8042 serio battery ac button loop unix
> > kernel: CPU: 0 PID: 452 Comm: bash Not tainted 5.13.7-gnu.1-NuMini #1
> > kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> > VirtualBox 12/01/2006
> > kernel: RIP: 0010:dev_disable_lro+0x108/0x150
>
> Again the warning isn't a big deal. I agree we should address - Jason
> any update?
I still think using NETIF_F_LRO might not be correct. Since we're
basically receiving GSO packets.
And it might cause a lot of issues if the device doesn't have
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
I see two possible fixes:
1) using NETIF_F_GRO_HW instead (the patch is attached)
or
2) set NETIF_F_LRO only if the device has CTRL_GUEST_OFFLOADS
Thanks
> But the main issue is you lose connectivity. That still
> persists with this? Can't you get a serial connection
> out? I know qemu Did the kernel oops afterwards?
>
> --
> MST
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-virtio-net-use-NETIF_F_GRO_HW-instead-of-NETIF_F_LRO.patch
Type: application/octet-stream
Size: 2478 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210811/f302d2b8/attachment-0001.obj>