Geert Uytterhoeven
2020-Sep-14 07:40 UTC
[Bridge] [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"
Hi David, CC bridge On Sun, Sep 13, 2020 at 3:34 AM David Miller <davem at davemloft.net> wrote:> From: Geert Uytterhoeven <geert at linux-m68k.org> > Date: Sat, 12 Sep 2020 14:33:59 +0200 > > > "dev" is not the bridge device, but the physical Ethernet interface, which > > may already be suspended during s2ram. > > Hmmm, ok. > > Looking more deeply NETDEV_CHANGE causes br_port_carrier_check() to run which > exits early if netif_running() is false, which is going to be true if > netif_device_present() is false: > > *notified = false; > if (!netif_running(br->dev)) > return; > > The only other work the bridge notifier does is: > > if (event != NETDEV_UNREGISTER) > br_vlan_port_event(p, event); > > and: > > /* Events that may cause spanning tree to refresh */ > if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP || > event == NETDEV_CHANGE || event == NETDEV_DOWN)) > br_ifinfo_notify(RTM_NEWLINK, NULL, p); > > So some vlan stuff, and emitting a netlink message to any available > listeners. > > Should we really do all of this for a device which is not even > present? > > This whole situation seems completely illogical. The device is > useless, it logically has no link or other state that can be managed > or used, while it is not present. > > So all of these bridge operations should only happen when the device > transitions back to present again.Thanks for your analysis! I'd like to defer this to the bridge people (CC). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Nikolay Aleksandrov
2020-Sep-18 12:35 UTC
[Bridge] [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"
On Mon, 2020-09-14 at 09:40 +0200, Geert Uytterhoeven wrote:> Hi David, > > CC bridge > > On Sun, Sep 13, 2020 at 3:34 AM David Miller <davem at davemloft.net> wrote: > > From: Geert Uytterhoeven <geert at linux-m68k.org> > > Date: Sat, 12 Sep 2020 14:33:59 +0200 > > > > > "dev" is not the bridge device, but the physical Ethernet interface, which > > > may already be suspended during s2ram. > > > > Hmmm, ok. > > > > Looking more deeply NETDEV_CHANGE causes br_port_carrier_check() to run which > > exits early if netif_running() is false, which is going to be true if > > netif_device_present() is false: > > > > *notified = false; > > if (!netif_running(br->dev)) > > return; > > > > The only other work the bridge notifier does is: > > > > if (event != NETDEV_UNREGISTER) > > br_vlan_port_event(p, event); > > > > and: > > > > /* Events that may cause spanning tree to refresh */ > > if (!notified && (event == NETDEV_CHANGEADDR || event == NETDEV_UP || > > event == NETDEV_CHANGE || event == NETDEV_DOWN)) > > br_ifinfo_notify(RTM_NEWLINK, NULL, p); > > > > So some vlan stuff, and emitting a netlink message to any available > > listeners. > > > > Should we really do all of this for a device which is not even > > present? > > > > This whole situation seems completely illogical. The device is > > useless, it logically has no link or other state that can be managed > > or used, while it is not present. > > > > So all of these bridge operations should only happen when the device > > transitions back to present again. > > Thanks for your analysis! > I'd like to defer this to the bridge people (CC). > > Gr{oetje,eeting}s, > > Geert >Hi, Sorry for the delay. Interesting problem. :) Thanks for the analysis, I don't see any issues with checking if the device isn't present. It will have to go through some testing, but no obvious objections/issues. Have you tried if it fixes your case? I have briefly gone over drivers' use of net_device_detach(), mostly it's used for suspends, but there are a few cases which use it on IO error or when a device is actually detaching (VF detach). The vlan port event is for vlan devices on top of the bridge when BROPT_VLAN_BRIDGE_BINDING is enabled and their carrier is changed based on vlan participating ports' state. Thanks, Nik
David Miller
2020-Sep-18 21:47 UTC
[Bridge] [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do_dev"
From: Nikolay Aleksandrov <nikolay at nvidia.com> Date: Fri, 18 Sep 2020 12:35:02 +0000> Thanks for the analysis, I don't see any issues with checking if the device > isn't present. It will have to go through some testing, but no obvious > objections/issues. Have you tried if it fixes your case? > I have briefly gone over drivers' use of net_device_detach(), mostly it's used > for suspends, but there are a few cases which use it on IO error or when a > device is actually detaching (VF detach). The vlan port event is for vlan > devices on top of the bridge when BROPT_VLAN_BRIDGE_BINDING is enabled and their > carrier is changed based on vlan participating ports' state.There are two things to resolve: 1) Why does the bridge need to get a change event for devices which have not fully resumed yet? 2) What kind of link state change is happening on devices which are not currently fully resumed yet? Really this whole situation is still quite mysterious to me. If the driver (or the PHY library it is using, etc.) is emitting link state changes before it marks itself as "present", that's the real bug.