Nikolay Aleksandrov
2023-May-16 08:56 UTC
[Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries
On 16/05/2023 11:53, Johannes Nixdorf wrote:> On Tue, May 16, 2023 at 11:38:11AM +0300, Nikolay Aleksandrov wrote: >> On 15/05/2023 11:50, Johannes Nixdorf wrote: >>> A malicious actor behind one bridge port may spam the kernel with packets >>> with a random source MAC address, each of which will create an FDB entry, >>> each of which is a dynamic allocation in the kernel. >>> >>> There are roughly 2^48 different MAC addresses, further limited by the >>> rhashtable they are stored in to 2^31. Each entry is of the type struct >>> net_bridge_fdb_entry, which is currently 128 bytes big. This means the >>> maximum amount of memory allocated for FDB entries is 2^31 * 128B >>> 256GiB, which is too much for most computers. >>> >>> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, >>> which, if nonzero, limits the amount of entries to a user specified >>> maximum. >>> >>> For backwards compatibility the default setting of 0 disables the limit. >>> >>> All changes to fdb_n_entries are under br->hash_lock, which means we do >>> not need additional locking. The call paths are (? denotes that >>> br->hash_lock is taken around the next call): >>> >>> - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ? >>> | +- br_fdb_change_mac_address ? >>> | +- br_fdb_delete_by_port ? >>> +- br_fdb_find_delete_local ? >>> +- fdb_add_local <-+- br_fdb_changeaddr ? >>> | +- br_fdb_change_mac_address ? >>> | +- br_fdb_add_local ? >>> +- br_fdb_cleanup ? >>> +- br_fdb_flush ? >>> +- br_fdb_delete_by_port ? >>> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ? >>> +- br_fdb_external_learn_del ? >>> - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ? >>> | +- br_fdb_change_mac_address ? >>> | +- br_fdb_add_local ? >>> +- br_fdb_update ? >>> +- fdb_add_entry <--- __br_fdb_add ? >>> +- br_fdb_external_learn_add ? >>> >>> Signed-off-by: Johannes Nixdorf <jnixdorf-oss at avm.de> >>> --- >>> include/uapi/linux/if_link.h | 1 + >>> net/bridge/br_device.c | 2 ++ >>> net/bridge/br_fdb.c | 6 ++++++ >>> net/bridge/br_netlink.c | 9 ++++++++- >>> net/bridge/br_private.h | 2 ++ >>> 5 files changed, 19 insertions(+), 1 deletion(-) >>> >> >> I completely missed the fact that you don't deal with the situation where you already have fdbs created >> and a limit is set later, then it would be useless because it will start counting from 0 even though >> there are already entries. > > This should not be an issue. The accounting starts with the bridge > creation and is never suspended, so if the user sets a limit later we > do not restart counting at 0. > > The only corner case I can see there is if the user sets a new limit > lower than the current number of FDB entries. In that case the code > currently leaves the bridge in a state where the limit is violated, > but refuses new FDB entries until the total is back below the limit. The > alternative of cleaning out old FDB entries until their number is under > the limit again seems to be more error prone to me as well, so I'd rather > leave it that way. >Ah, good. That's ok then.>> Also another issue that came to mind is that you don't deal with fdb_create() >> for "special" entries, i.e. when adding a port. Currently it will print an error, but you should revisit >> all callers and see where it might be a problem. > > I'll have a look again, also to see whether only counting dynamic > entries created as a reaction to observed packets might be a viable > alternative. If the user creates the entries by adding a port or manually > via netlink I see no reason to restrict them to the same limit.Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be only dynamic entries. We should include more people in this discussion (+CC Ido and Vladimir). Switchdev folks might have more specific requirements and restrictions, so it'd be nice to get their input as well.
Vladimir Oltean
2023-May-16 10:21 UTC
[Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries
Hi, On Tue, May 16, 2023 at 11:56:41AM +0300, Nikolay Aleksandrov wrote:> Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be > only dynamic entries. We should include more people in this discussion (+CC Ido and Vladimir). > Switchdev folks might have more specific requirements and restrictions, so it'd be nice to get > their input as well.I have some other things to do until I can take a closer look at this discussion, but in principle, switchdev drivers will likely want to impose their own limit on FDB entries because the hardware itself is inherently limited in size, so I'm thinking there should be another way for the software bridge to be informed about this limit other than UAPI. Which ports that limit should affect (think bridging between ports of different switches with different FDB sizes) I don't know. If we only consider switchdev, FDB limits should probably be per hwdom. Also, in terms of static vs dynamic limits, I've seen hardware implementations where static FDB entries go to a different FDB table compared to dynamic ones (Microchip KSZ DSA switches), implementations where static partitioning between static and dynamic FDB entries is possible but configurable, and implementations where they all consume from the shared space and you'd have to evict a dynamic entry to install a static one. So it's hard to really say what's the size. That, plus not to mention, many hardware FDBs are not fully associative, and due to hash collisions, you may be unable to install an entry in the 4-way associative bin where its {MAC,VID} hash says it should go, even though the FDB at large is not full. It sounds sexy to take switchdev into consideration, but I'm not really sure what we want. Something flexible to cater for the above, probably. This discussion should probably be merged with: https://lore.kernel.org/netdev/20230324144917.32lnpgtw5auuyovy at skbuf/T/#ma600839815582ca61886e83ba533b1dfbe447557 so I'm CCing Oleksij too, since he probably knows better than me what he wants. In the thread with DSA trace events, there also was a short talk about user space theoretically being able to infer FDB sizes and utilization degree based on instrumenting with ftrace, which is something we wouldn't like to have to maintain. So I'm adding the DSA maintainers too, since there is interest for agreeing on a different API. https://lore.kernel.org/netdev/2f150ad4-34f4-4af9-b3ce-c1aff208ec7e at lunn.ch/T/#mfa895245fd012e8f66db784fa568109dba396aa7