Hans Schultz
2022-Mar-25 13:48 UTC
[Bridge] [PATCH v2 net-next 2/4] net: switchdev: add support for offloading of fdb locked flag
On fre, mar 25, 2022 at 15:21, Vladimir Oltean <olteanv at gmail.com> wrote:> On Fri, Mar 25, 2022 at 08:50:34AM +0100, Hans Schultz wrote: >> On tor, mar 24, 2022 at 16:27, Vladimir Oltean <olteanv at gmail.com> wrote: >> > On Thu, Mar 24, 2022 at 12:23:39PM +0100, Hans Schultz wrote: >> >> On tor, mar 24, 2022 at 13:09, Vladimir Oltean <olteanv at gmail.com> wrote: >> >> > On Thu, Mar 24, 2022 at 11:32:08AM +0100, Hans Schultz wrote: >> >> >> On ons, mar 23, 2022 at 16:43, Vladimir Oltean <olteanv at gmail.com> wrote: >> >> >> > On Wed, Mar 23, 2022 at 01:49:32PM +0100, Hans Schultz wrote: >> >> >> >> >> Does someone have an idea why there at this point is no option to add a >> >> >> >> >> dynamic fdb entry? >> >> >> >> >> >> >> >> >> >> The fdb added entries here do not age out, while the ATU entries do >> >> >> >> >> (after 5 min), resulting in unsynced ATU vs fdb. >> >> >> >> > >> >> >> >> > I think the expectation is to use br_fdb_external_learn_del() if the >> >> >> >> > externally learned entry expires. The bridge should not age by itself >> >> >> >> > FDB entries learned externally. >> >> >> >> > >> >> >> >> >> >> >> >> It seems to me that something is missing then? >> >> >> >> My tests using trafgen that I gave a report on to Lunn generated massive >> >> >> >> amounts of fdb entries, but after a while the ATU was clean and the fdb >> >> >> >> was still full of random entries... >> >> >> > >> >> >> > I'm no longer sure where you are, sorry.. >> >> >> > I think we discussed that you need to enable ATU age interrupts in order >> >> >> > to keep the ATU in sync with the bridge FDB? Which means either to >> >> >> > delete the locked FDB entries from the bridge when they age out in the >> >> >> > ATU, or to keep refreshing locked ATU entries. >> >> >> > So it seems that you're doing neither of those 2 things if you end up >> >> >> > with bridge FDB entries which are no longer in the ATU. >> >> >> >> >> >> Any idea why G2 offset 5 ATUAgeIntEn (bit 10) is set? There is no define >> >> >> for it, so I assume it is something default? >> >> > >> >> > No idea, but I can confirm that the out-of-reset value I see for >> >> > MV88E6XXX_G2_SWITCH_MGMT on 6190 and 6390 is 0x400. It's best not to >> >> > rely on any reset defaults though. >> >> >> >> I see no age out interrupts, even though the ports Age Out Int is on >> >> (PAV bit 14) on the locked port, and the ATU entries do age out (HoldAt1 >> >> is off). Any idea why that can be? >> >> >> >> I combination with this I think it would be nice to have an ability to >> >> set the AgeOut time even though it is not per port but global. >> > >> > Sorry, I just don't know. Looking at the documentation for IntOnAgeOut, >> > I see it says that for an ATU entry to trigger an age out interrupt, the >> > port it's associated with must have IntOnAgeOut set. >> > But your locked ATU entries aren't associated with any port, they have >> > DPV=0, right? So will they never trigger any age out interrupt according >> > to this? I'm not clear. >> >> I think that's absolutely right. That leaves two options. Either "port >> 10" if it has IntOnAgeOut setting, or the reason why I wrote my comments >> in this part of the code, that it should be able to add a dynamic entry >> in the bridge module from the driver. > > I'm sorry, I wasn't fully aware of the implications of the fact that > your 'locked' FDB entries have a DPV of all zeroes in hardware. > Practically, this means that while the locked bridge FDB entry is > associated with a bridge port, the ATU entry is associated with no port. > > In turn, the hardware cannot ever true detect station migrations, > because it doesn't know which port this station migrates _from_ (you're > not telling it that). Every packet with this MAC SA is a station > migration, in effect, which you (for good reason) choose to ignore to > avoid denial of service. > > Mark the locked (DPV=0) ATU entry as static, and you'll keep your CPU > clean of any ATU miss or member violation of this MAC SA. Read this as > "you'll need to call IT to ask them to remove it". Undesirable IMHO. > > Mark the locked entry as non-static, and the entry will eventually > expire, with no interrupt to signal that - because any ATU age interrupt, > as mentioned, is fundamentally linked to a port. > > You see this as a negative, and you're looking for ways to inform the > bridge driver that the locked FDB entry went away. But you aren't > looking at this the right way, I think. Making the mv88e6xxx driver > remove the locked FDB entry from the bridge seems like a non-goal now. > > If you'd cache the locked ATU entry in the mv88e6xxx driver, and you'd > notify switchdev only if the entry is new to the cache, then you'd > actually still achieve something major. Yes, the bridge FDB will contain > locked FDB entries that aren't in the ATU. But that's because your > printer has been silent for X seconds. The policy for the printer still > hasn't changed, as far as the mv88e6xxx, or bridge, software drivers are > concerned. If the unauthorized printer says something again after the > locked ATU entry expires, the mv88e6xxx driver will find its MAC SA > in the cache of denied addresses, and reload the ATU. What this > achievesThe driver will in this case just trigger a new miss violation and add the entry again I think. The problem with all this is that a malicious attack that spams the switch with random mac addresses will be able to DOS the device as any handling of the fdb will be too resource demanding. That is why it is needed to remove those fdb entries after a time out, which dynamic entries would serve.> is that the number of ATU violation interrupts isn't proportional to the > number of packets sent by the printer, but with the ageing time you > configure for this ATU entry. You should be able to play with an > entry->state in the range of 1 -> 7 and get a good compromise between > responsiveness on station migrations and number of ATU interrupts to > service once the locked ATU entry is invalidated. In my opinion even the > quickest-to-expire entry->state of 1 is way better than letting every > packet spam the CPU. And you can always keep your cached locked ATU > entry in sync with the port that triggered the violation interrupt, and > figure out station migrations in software this way. > > I hope I understood the hardware behavior correctly, I don't have any > direct experience with 802.1X as I mentioned, and only limited and > non-expert experience with Marvell hardware. This is just my > interpretation of some random documentation I found online.
Vladimir Oltean
2022-Mar-25 14:00 UTC
[Bridge] [PATCH v2 net-next 2/4] net: switchdev: add support for offloading of fdb locked flag
On Fri, Mar 25, 2022 at 02:48:36PM +0100, Hans Schultz wrote:> > If you'd cache the locked ATU entry in the mv88e6xxx driver, and you'd > > notify switchdev only if the entry is new to the cache, then you'd > > actually still achieve something major. Yes, the bridge FDB will contain > > locked FDB entries that aren't in the ATU. But that's because your > > printer has been silent for X seconds. The policy for the printer still > > hasn't changed, as far as the mv88e6xxx, or bridge, software drivers are > > concerned. If the unauthorized printer says something again after the > > locked ATU entry expires, the mv88e6xxx driver will find its MAC SA > > in the cache of denied addresses, and reload the ATU. What this > > achieves > > The driver will in this case just trigger a new miss violation and add > the entry again I think. > The problem with all this is that a malicious attack that spams the > switch with random mac addresses will be able to DOS the device as any > handling of the fdb will be too resource demanding. That is why it is > needed to remove those fdb entries after a time out, which dynamic > entries would serve.An attacker sweeping through the 2^47 source MAC address range is a problem regardless of the implementations proposed so far, no? If unlimited growth of the mv88e6xxx locked ATU entry cache is a concern (which it is), we could limit its size, and when we purge a cached entry in software is also when we could emit a SWITCHDEV_FDB_DEL_TO_BRIDGE for it, right?