Vladimir Oltean
2022-Mar-22 11:08 UTC
[Bridge] [PATCH net-next 3/3] net: dsa: mv88e6xxx: mac-auth/MAB implementation
On Tue, Mar 22, 2022 at 12:01:13PM +0100, Hans Schultz wrote:> On fre, mar 18, 2022 at 15:19, Vladimir Oltean <olteanv at gmail.com> wrote: > > On Fri, Mar 18, 2022 at 02:10:26PM +0100, Hans Schultz wrote: > >> In the offloaded case there is no difference between static and dynamic > >> flags, which I see as a general issue. (The resulting ATU entry is static > >> in either case.) > > > > It _is_ a problem. We had the same problem with the is_local bit. > > Independently of this series, you can add the dynamic bit to struct > > switchdev_notifier_fdb_info and make drivers reject it. > > > >> These FDB entries are removed when link goes down (soft or hard). The > >> zero DPV entries that the new code introduces age out after 5 minutes, > >> while the locked flagged FDB entries are removed by link down (thus the > >> FDB and the ATU are not in sync in this case). > > > > Ok, so don't let them disappear from hardware, refresh them from the > > driver, since user space and the bridge driver expect that they are > > still there. > > I have now tested with two extra unmanaged switches (each connected to a > seperate port on our managed switch, and when migrating from one port to > another, there is member violations, but as the initial entry ages out, > a new miss violation occurs and the new port adds the locked entry. In > this case I only see one locked entry, either on the initial port or > later on the port the host migrated to (via switch). > > If I refresh the ATU entries indefinitly, then this migration will for > sure not work, and with the member violation suppressed, it will be > silent about it.Manual says that migrations should trigger miss violations if configured adequately, is this not the case?> So I don't think it is a good idea to refresh the ATU entries > indefinitely. > > Another issue I see, is that there is a deadlock or similar issue when > receiving violations and running 'bridge fdb show' (it seemed that > member violations also caused this, but not sure yet...), as the unit > freezes, not to return...Have you enabled lockdep, debug atomic sleep, detect hung tasks, things like that?
Hans Schultz
2022-Mar-22 13:21 UTC
[Bridge] [PATCH net-next 3/3] net: dsa: mv88e6xxx: mac-auth/MAB implementation
On tis, mar 22, 2022 at 13:08, Vladimir Oltean <olteanv at gmail.com> wrote:> On Tue, Mar 22, 2022 at 12:01:13PM +0100, Hans Schultz wrote: >> On fre, mar 18, 2022 at 15:19, Vladimir Oltean <olteanv at gmail.com> wrote: >> > On Fri, Mar 18, 2022 at 02:10:26PM +0100, Hans Schultz wrote: >> >> In the offloaded case there is no difference between static and dynamic >> >> flags, which I see as a general issue. (The resulting ATU entry is static >> >> in either case.) >> > >> > It _is_ a problem. We had the same problem with the is_local bit. >> > Independently of this series, you can add the dynamic bit to struct >> > switchdev_notifier_fdb_info and make drivers reject it. >> > >> >> These FDB entries are removed when link goes down (soft or hard). The >> >> zero DPV entries that the new code introduces age out after 5 minutes, >> >> while the locked flagged FDB entries are removed by link down (thus the >> >> FDB and the ATU are not in sync in this case). >> > >> > Ok, so don't let them disappear from hardware, refresh them from the >> > driver, since user space and the bridge driver expect that they are >> > still there. >> >> I have now tested with two extra unmanaged switches (each connected to a >> seperate port on our managed switch, and when migrating from one port to >> another, there is member violations, but as the initial entry ages out, >> a new miss violation occurs and the new port adds the locked entry. In >> this case I only see one locked entry, either on the initial port or >> later on the port the host migrated to (via switch). >> >> If I refresh the ATU entries indefinitly, then this migration will for >> sure not work, and with the member violation suppressed, it will be >> silent about it. > > Manual says that migrations should trigger miss violations if configured > adequately, is this not the case? >Yes, but that depends on the ATU entries ageing out. As it is now, it works.>> So I don't think it is a good idea to refresh the ATU entries >> indefinitely. >> >> Another issue I see, is that there is a deadlock or similar issue when >> receiving violations and running 'bridge fdb show' (it seemed that >> member violations also caused this, but not sure yet...), as the unit >> freezes, not to return... > > Have you enabled lockdep, debug atomic sleep, detect hung tasks, things > like that?No, I haven't looked deeper into it yet. Maybe I was hoping someone had an idea... but I guess it cannot be a netlink deadlock?
Hans Schultz
2022-Mar-23 10:13 UTC
[Bridge] [PATCH net-next 3/3] net: dsa: mv88e6xxx: mac-auth/MAB implementation
On tis, mar 22, 2022 at 13:08, Vladimir Oltean <olteanv at gmail.com> wrote:> On Tue, Mar 22, 2022 at 12:01:13PM +0100, Hans Schultz wrote: >> On fre, mar 18, 2022 at 15:19, Vladimir Oltean <olteanv at gmail.com> wrote: >> > On Fri, Mar 18, 2022 at 02:10:26PM +0100, Hans Schultz wrote: >> >> In the offloaded case there is no difference between static and dynamic >> >> flags, which I see as a general issue. (The resulting ATU entry is static >> >> in either case.) >> > >> > It _is_ a problem. We had the same problem with the is_local bit. >> > Independently of this series, you can add the dynamic bit to struct >> > switchdev_notifier_fdb_info and make drivers reject it. >> > >> >> These FDB entries are removed when link goes down (soft or hard). The >> >> zero DPV entries that the new code introduces age out after 5 minutes, >> >> while the locked flagged FDB entries are removed by link down (thus the >> >> FDB and the ATU are not in sync in this case). >> > >> > Ok, so don't let them disappear from hardware, refresh them from the >> > driver, since user space and the bridge driver expect that they are >> > still there. >> >> I have now tested with two extra unmanaged switches (each connected to a >> seperate port on our managed switch, and when migrating from one port to >> another, there is member violations, but as the initial entry ages out, >> a new miss violation occurs and the new port adds the locked entry. In >> this case I only see one locked entry, either on the initial port or >> later on the port the host migrated to (via switch). >> >> If I refresh the ATU entries indefinitly, then this migration will for >> sure not work, and with the member violation suppressed, it will be >> silent about it. > > Manual says that migrations should trigger miss violations if configured > adequately, is this not the case? > >> So I don't think it is a good idea to refresh the ATU entries >> indefinitely. >> >> Another issue I see, is that there is a deadlock or similar issue when >> receiving violations and running 'bridge fdb show' (it seemed that >> member violations also caused this, but not sure yet...), as the unit >> freezes, not to return... > > Have you enabled lockdep, debug atomic sleep, detect hung tasks, things > like that?I have now determined that it is the rtnl_lock() that causes the "deadlock". The doit() in rtnetlink.c is under rtnl_lock() and is what takes care of getting the fdb entries when running 'bridge fdb show'. In principle there should be no problem with this, but I don't know if some interrupt queue is getting jammed as they are blocked from rtnetlink.c?