Joseph Huang
2021-May-04 18:22 UTC
[Bridge] [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
This series of patches contains the following fixes: 1. In a distributed system with multiple hardware-offloading bridges, if a multicast source is attached to a Non-Querier bridge, the bridge will not forward any multicast packets from that source to the Querier. +--------------------+ | | | Snooping | +------------+ | Bridge 1 |----| Listener 1 | | (Querier) | +------------+ | | +--------------------+ | | +----+---------+-----+ | | mrouter | | +-----------+ | +---------+ | +------------+ | MC Source |----| Snooping |----| Listener 2 | +-----------| | Bridge 2 | +------------+ | (Non-Querier) | +--------------------+ In this scenario, Listener 1 will never receive multicast traffic from MC Source since Snooping Bridge 2 does not forward multicast packets to the mrouter port. Patches 0001, 0002, and 0003 address this issue. 2. If mcast_flood is disabled on a bridge port, some of the snooping functions stop working properly. a. Consider the following scenario: +--------------------+ | | | Snooping | +------------+ | Bridge 1 |----| Listener 1 | | (Querier) | +------------+ | | +--------------------+ | | +--------------------+ | | mrouter | | +-----------+ | +---------+ | | MC Source |----| Snooping | +-----------| | Bridge 2 | | (Non-Querier) | +--------------------+ In this scenario, Listener 1 will never receive multicast traffic from MC Source if mcast_flood is disabled on the mrouter port on Snooping Bridge 2. Patch 0004 addresses this issue. b. For a Non-Querier bridge, if mcast_flood is disabled on a bridge port, Queries received from other Querier will not be forwarded out of that bridge port. Patch 0005 addresses this issue. 3. After a system boots up, the first couple Reports are not handled properly: 1) the Report from the Host is being flooded (via br_flood) to all bridge ports, and 2) if the mrouter port's mcast_flood is disabled, the Reports received from other hosts will not be forwarded to the Querier. Patch 0006 addresses this issue. These patches were developed and verified initially against 5.4 kernel (due to hardware platform limitation) and forward-patched to 5.12. Snooping code introduced between 5.4 and 5.12 are not extensively tested (only IGMPv2/MLDv1 were tested). The hardware platform used were two bridges utilizing a single Marvell 88E6352 Ethernet switch chip (i.e., no cross-chip bridging involved). Joseph Huang (6): bridge: Refactor br_mdb_notify bridge: Offload mrouter port forwarding to switchdev bridge: Avoid traffic disruption when Querier state changes bridge: Force mcast_flooding for mrouter ports bridge: Flood Queries even when mcast_flood is disabled bridge: Always multicast_flood Reports net/bridge/br_device.c | 5 +- net/bridge/br_forward.c | 3 +- net/bridge/br_input.c | 5 +- net/bridge/br_mdb.c | 70 +++++++++++++--------- net/bridge/br_multicast.c | 121 ++++++++++++++++++++++++++++++++++---- net/bridge/br_private.h | 11 +++- 6 files changed, 169 insertions(+), 46 deletions(-) base-commit: 5e321ded302da4d8c5d5dd953423d9b748ab3775 -- 2.17.1
Separate out switchdev notification to its own function in preparation for the patch "bridge: Offload mrouter port forwarding to switchdev". Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> --- net/bridge/br_mdb.c | 57 ++++++++++++++++++++++++----------------- net/bridge/br_private.h | 2 ++ 2 files changed, 36 insertions(+), 23 deletions(-) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index 95fa4af0e8dd..e8684d798ec3 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -669,10 +669,9 @@ static void br_mdb_switchdev_host(struct net_device *dev, br_mdb_switchdev_host_port(dev, lower_dev, mp, type); } -void br_mdb_notify(struct net_device *dev, - struct net_bridge_mdb_entry *mp, - struct net_bridge_port_group *pg, - int type) +void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp, + struct net_bridge_port *p, + int type) { struct br_mdb_complete_info *complete_info; struct switchdev_obj_port_mdb mdb = { @@ -681,30 +680,42 @@ void br_mdb_notify(struct net_device *dev, .flags = SWITCHDEV_F_DEFER, }, }; + + if (!p) + return; + + br_switchdev_mdb_populate(&mdb, mp); + + mdb.obj.orig_dev = p->dev; + switch (type) { + case RTM_NEWMDB: + complete_info = kmalloc(sizeof(*complete_info), GFP_ATOMIC); + if (!complete_info) + break; + complete_info->port = p; + complete_info->ip = mp->addr; + mdb.obj.complete_priv = complete_info; + mdb.obj.complete = br_mdb_complete; + if (switchdev_port_obj_add(p->dev, &mdb.obj, NULL)) + kfree(complete_info); + break; + case RTM_DELMDB: + switchdev_port_obj_del(p->dev, &mdb.obj); + break; + } +} + +void br_mdb_notify(struct net_device *dev, + struct net_bridge_mdb_entry *mp, + struct net_bridge_port_group *pg, + int type) +{ struct net *net = dev_net(dev); struct sk_buff *skb; int err = -ENOBUFS; if (pg) { - br_switchdev_mdb_populate(&mdb, mp); - - mdb.obj.orig_dev = pg->key.port->dev; - switch (type) { - case RTM_NEWMDB: - complete_info = kmalloc(sizeof(*complete_info), GFP_ATOMIC); - if (!complete_info) - break; - complete_info->port = pg->key.port; - complete_info->ip = mp->addr; - mdb.obj.complete_priv = complete_info; - mdb.obj.complete = br_mdb_complete; - if (switchdev_port_obj_add(pg->key.port->dev, &mdb.obj, NULL)) - kfree(complete_info); - break; - case RTM_DELMDB: - switchdev_port_obj_del(pg->key.port->dev, &mdb.obj); - break; - } + br_mdb_switchdev_port(mp, pg->key.port, type); } else { br_mdb_switchdev_host(dev, mp, type); } diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 7ce8a77cc6b6..5cba9d228b9c 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -829,6 +829,8 @@ br_multicast_new_port_group(struct net_bridge_port *port, struct br_ip *group, u8 filter_mode, u8 rt_protocol); int br_mdb_hash_init(struct net_bridge *br); void br_mdb_hash_fini(struct net_bridge *br); +void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp, + struct net_bridge_port *p, int type); void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp, struct net_bridge_port_group *pg, int type); void br_rtr_notify(struct net_device *dev, struct net_bridge_port *port, -- 2.17.1
Joseph Huang
2021-May-04 18:22 UTC
[Bridge] [PATCH 2/6] bridge: Offload mrouter port forwarding to switchdev
Offload the mrouter port forwarding to switchdev also. Currently multicast snooping fails to forward traffic in some cases where there're multiple hardware-offloading bridges involved. Consider the following scenario: +--------------------+ | | | Snooping +--| +------------+ | Bridge 1 |P1|----| Listener 1 | | (Querier) +--| +------------+ | | +--------------------+ | | +--------------------+ | | mrouter | | +-----------+ | +---------+ +--| +------------+ | MC Source |----| Snooping |P2|----| Listener 2 | +-----------| | Bridge 2 +--| +------------+ | (Non-Querier) | +--------------------+ In this scenario, Listener 2 is able to receive multicast traffic from MC Source while Listener 1 is not. The reason is that, on Snooping Bridge 2, when the (soft) bridge attempts to forward a packet to the mrouter port via br_multicast_flood, the effort is blocked by nbp_switchdev_allowed_egress, since offload_fwd_mark indicates that the packet should have been handled by the hardware already. Listener 2 would receive the packets without any problem since P2 is programmed into the hardware as a member of the group; however the mrouter port would not since the mrouter port would normally not be a member of any group, and thus will not be added to the address database on the hardware switch chip. This patch takes a simplistic approach: when an mrouter port is added/ deleted, it's added/deleted to all mdb groups; and similarly, when an mdb group is added/deleted, all mrouter ports are added/deleted to/from it. Before this patch, switchdev programming matches exactly with mdb: +-----+ | mdb | +-----+ | +----------------------------------------------+ | | +--------------------------------+ | | | | both in mdb and switchdev | | | | | +------+ +------+ +------+ | | | +--------|-| port |---| port |---| port | | | | | +------+ +------+ +------+ | | | switchdev +--------------------------------+ | +----------------------------------------------+ After this patch, some entries will only exist in switchdev and not in mdb: +-----+ | mdb | +-----+ | +---------------------------------------------------------------------+ | | +--------------------------------++---------------------+ | | | | both in mdb and switchdev || only in switchdev | | | | | +------+ +------+ +------+ || +------+ +------+ | | | +--------|-| port |---| port |---| port | || | mr |---| mr | | | | | +------+ +------+ +------+ || +------+ +------+ | | | switchdev +--------------------------------++---------------------+ | +---------------------------------------------------------------------+ Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> --- net/bridge/br_multicast.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 226bb05c3b42..5ed0d5efef09 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -522,10 +522,26 @@ static void br_multicast_destroy_mdb_entry(struct net_bridge_mcast_gc *gc) kfree_rcu(mp, rcu); } +/* Add/delete all mrouter ports to/from a group + * called while br->multicast_lock is held + */ +static void br_multicast_group_change(struct net_bridge_mdb_entry *mp, + bool is_group_added) +{ + struct net_bridge_port *p; + struct hlist_node *n; + + hlist_for_each_entry_safe(p, n, &mp->br->router_list, rlist) + br_mdb_switchdev_port(mp, p, is_group_added ? + RTM_NEWMDB : RTM_DELMDB); +} + static void br_multicast_del_mdb_entry(struct net_bridge_mdb_entry *mp) { struct net_bridge *br = mp->br; + br_multicast_group_change(mp, false); + rhashtable_remove_fast(&br->mdb_hash_tbl, &mp->rhnode, br_mdb_rht_params); hlist_del_init_rcu(&mp->mdb_node); @@ -1068,6 +1084,8 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br, hlist_add_head_rcu(&mp->mdb_node, &br->mdb_list); } + br_multicast_group_change(mp, true); + return mp; } @@ -2651,8 +2669,18 @@ static void br_port_mc_router_state_change(struct net_bridge_port *p, .flags = SWITCHDEV_F_DEFER, .u.mrouter = is_mc_router, }; + struct net_bridge_mdb_entry *mp; + struct hlist_node *n; switchdev_port_attr_set(p->dev, &attr, NULL); + + /* Add/delete the router port to/from all multicast group + * called whle br->multicast_lock is held + */ + hlist_for_each_entry_safe(mp, n, &p->br->mdb_list, mdb_node) { + br_mdb_switchdev_port(mp, p, is_mc_router ? + RTM_NEWMDB : RTM_DELMDB); + } } /* -- 2.17.1
Joseph Huang
2021-May-04 18:22 UTC
[Bridge] [PATCH 3/6] bridge: Avoid traffic disruption when Querier state changes
Modify br_mdb_notify so that switchdev mdb purge events are never sent for mrouter ports, and when a non-mrouter port turns into an mrouter port, all port groups associated with that port are deleted immediately. Consider the following scenario: +--------------------+ | | +-----------+ | Snooping +--| +------------+ | MC Source |----| Bridge 1 |P1|----| Listener 1 | +-----------| | +--+ +--| +------------+ | |P2| | +--------------------+ | | +--------------------+ | |P3| | | +--+ +--| +---------------+ | Snooping |P4|----| Listener 2 | | Bridge 2 +--| | Joins Group A | | | +---------------+ +--------------------+ Assuming initially Snooping Bridge 1 is the Querier, and Snooping Bridge 2 is a Non-Querier. After some Query/Report exchange, Snooping Bridge 1 would create an mdb group A and add P2 to the group, and starts a timer on the port group A/P2. Let's say Snooping Bridge 2 becomes the Querier for some reason (e.g., Snooping Bridge 2 rebooted) before the port group A/P2 expires. With the patch 'bridge: Offload mrouter port forwarding to switchdev', Snooping Bridge 1 detects that P2 has now become an mrouter port, and will add it to the address database on the hardware switch chip (even though it's already there when the port group A/P2 was added). This is all fine until the timer on port group A/P2 expires, and then Snooping Bridge 1 will purge P2 from the address database on the switch chip. Now Listener 2 will not be able to receive multicast traffic from MC Source anymore. With this patch, immediately after a bridge port turns into an mrouter port, the port's membership information is removed from the bridge' mdb, but remains programmed in the address database on the hardware chip, just to be consistent with the database/programming state as before the Querier role change. The hardware programming will be cleaned up when the group expires (via br_multicast_group_change). Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> --- net/bridge/br_mdb.c | 17 +++++---- net/bridge/br_multicast.c | 78 ++++++++++++++++++++++++++++++++------- net/bridge/br_private.h | 3 +- 3 files changed, 75 insertions(+), 23 deletions(-) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index e8684d798ec3..c121b780450b 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -708,16 +708,17 @@ void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp, void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp, struct net_bridge_port_group *pg, - int type) + int type, bool swdev_notify) { struct net *net = dev_net(dev); struct sk_buff *skb; int err = -ENOBUFS; - if (pg) { - br_mdb_switchdev_port(mp, pg->key.port, type); - } else { - br_mdb_switchdev_host(dev, mp, type); + if (swdev_notify) { + if (pg) + br_mdb_switchdev_port(mp, pg->key.port, type); + else + br_mdb_switchdev_host(dev, mp, type); } skb = nlmsg_new(rtnl_mdb_nlmsg_size(pg), GFP_ATOMIC); @@ -1011,7 +1012,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port, } br_multicast_host_join(mp, false); - br_mdb_notify(br->dev, mp, NULL, RTM_NEWMDB); + br_mdb_notify(br->dev, mp, NULL, RTM_NEWMDB, true); return 0; } @@ -1042,7 +1043,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port, rcu_assign_pointer(*pp, p); if (entry->state == MDB_TEMPORARY) mod_timer(&p->timer, now + br->multicast_membership_interval); - br_mdb_notify(br->dev, mp, p, RTM_NEWMDB); + br_mdb_notify(br->dev, mp, p, RTM_NEWMDB, true); /* if we are adding a new EXCLUDE port group (*,G) it needs to be also * added to all S,G entries for proper replication, if we are adding * a new INCLUDE port (S,G) then all of *,G EXCLUDE ports need to be @@ -1176,7 +1177,7 @@ static int __br_mdb_del(struct net_bridge *br, struct br_mdb_entry *entry, if (entry->ifindex == mp->br->dev->ifindex && mp->host_joined) { br_multicast_host_leave(mp, false); err = 0; - br_mdb_notify(br->dev, mp, NULL, RTM_DELMDB); + br_mdb_notify(br->dev, mp, NULL, RTM_DELMDB, true); if (!mp->ports && netif_running(br->dev)) mod_timer(&mp->timer, jiffies); goto unlock; diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 5ed0d5efef09..d7fbe1f3af18 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -506,7 +506,7 @@ static void br_multicast_fwd_src_handle(struct net_bridge_group_src *src) sg_mp = br_mdb_ip_get(src->br, &sg_key.addr); if (!sg_mp) return; - br_mdb_notify(src->br->dev, sg_mp, sg, RTM_NEWMDB); + br_mdb_notify(src->br->dev, sg_mp, sg, RTM_NEWMDB, true); } } @@ -617,7 +617,12 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp, br_multicast_eht_clean_sets(pg); hlist_for_each_entry_safe(ent, tmp, &pg->src_list, node) br_multicast_del_group_src(ent, false); - br_mdb_notify(br->dev, mp, pg, RTM_DELMDB); + /* don't notify switchdev if mrouter port + * switchdev will be notified when group expires via + * br_multicast_group_change + */ + br_mdb_notify(br->dev, mp, pg, RTM_DELMDB, + hlist_unhashed(&pg->key.port->rlist)); if (!br_multicast_is_star_g(&mp->addr)) { rhashtable_remove_fast(&br->sg_port_tbl, &pg->rhnode, br_sg_port_rht_params); @@ -688,7 +693,7 @@ static void br_multicast_port_group_expired(struct timer_list *t) if (WARN_ON(!mp)) goto out; - br_mdb_notify(br->dev, mp, pg, RTM_NEWMDB); + br_mdb_notify(br->dev, mp, pg, RTM_NEWMDB, true); } out: spin_unlock(&br->multicast_lock); @@ -1228,7 +1233,7 @@ void br_multicast_host_join(struct net_bridge_mdb_entry *mp, bool notify) if (br_multicast_is_star_g(&mp->addr)) br_multicast_star_g_host_state(mp); if (notify) - br_mdb_notify(mp->br->dev, mp, NULL, RTM_NEWMDB); + br_mdb_notify(mp->br->dev, mp, NULL, RTM_NEWMDB, true); } if (br_group_is_l2(&mp->addr)) @@ -1246,7 +1251,7 @@ void br_multicast_host_leave(struct net_bridge_mdb_entry *mp, bool notify) if (br_multicast_is_star_g(&mp->addr)) br_multicast_star_g_host_state(mp); if (notify) - br_mdb_notify(mp->br->dev, mp, NULL, RTM_DELMDB); + br_mdb_notify(mp->br->dev, mp, NULL, RTM_DELMDB, true); } static struct net_bridge_port_group * @@ -1294,7 +1299,7 @@ __br_multicast_add_group(struct net_bridge *br, rcu_assign_pointer(*pp, p); if (blocked) p->flags |= MDB_PG_FLAGS_BLOCKED; - br_mdb_notify(br->dev, mp, p, RTM_NEWMDB); + br_mdb_notify(br->dev, mp, p, RTM_NEWMDB, true); found: if (igmpv2_mldv1) @@ -2436,7 +2441,7 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br, break; } if (changed) - br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB); + br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB, true); unlock_continue: spin_unlock_bh(&br->multicast_lock); } @@ -2575,7 +2580,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br, break; } if (changed) - br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB); + br_mdb_notify(br->dev, mdst, pg, RTM_NEWMDB, true); unlock_continue: spin_unlock_bh(&br->multicast_lock); } @@ -2660,26 +2665,71 @@ br_multicast_update_query_timer(struct net_bridge *br, mod_timer(&query->timer, jiffies + br->multicast_querier_interval); } -static void br_port_mc_router_state_change(struct net_bridge_port *p, +static void br_port_mc_router_state_change(struct net_bridge_port *port, bool is_mc_router) { struct switchdev_attr attr = { - .orig_dev = p->dev, + .orig_dev = port->dev, .id = SWITCHDEV_ATTR_ID_PORT_MROUTER, .flags = SWITCHDEV_F_DEFER, .u.mrouter = is_mc_router, }; struct net_bridge_mdb_entry *mp; + struct net_bridge *br = port->br; struct hlist_node *n; - switchdev_port_attr_set(p->dev, &attr, NULL); + switchdev_port_attr_set(port->dev, &attr, NULL); /* Add/delete the router port to/from all multicast group * called whle br->multicast_lock is held */ - hlist_for_each_entry_safe(mp, n, &p->br->mdb_list, mdb_node) { - br_mdb_switchdev_port(mp, p, is_mc_router ? - RTM_NEWMDB : RTM_DELMDB); + hlist_for_each_entry_safe(mp, n, &br->mdb_list, mdb_node) { + struct net_bridge_port_group __rcu **pp; + struct net_bridge_port_group *p; + int port_group_exists = 0; + + if (is_mc_router) { + for (pp = &mp->ports; + (p = mlock_dereference(*pp, br)) != NULL; + pp = &p->next) { + if (p->key.port == port) { + port_group_exists = 1; + if (!(p->flags & MDB_PG_FLAGS_PERMANENT)) + br_multicast_del_pg(mp, p, pp); + } + + if ((unsigned long)p->key.port < (unsigned long)port) + break; + } + + if (port_group_exists) + continue; + + br_mdb_switchdev_port(mp, port, RTM_NEWMDB); + } else { + for (pp = &mp->ports; + (p = mlock_dereference(*pp, br)) != NULL; + pp = &p->next) { + if (p->key.port == port) { + port_group_exists = 1; + break; + } + + if ((unsigned long)p->key.port < (unsigned long)port) + break; + } + + if (port_group_exists) + continue; + + p = br_multicast_new_port_group(port, &mp->addr, *pp, 0, + NULL, MCAST_EXCLUDE, RTPROT_KERNEL); + if (unlikely(!p)) + continue; + rcu_assign_pointer(*pp, p); + br_mdb_notify(br->dev, mp, p, RTM_NEWMDB, false); + mod_timer(&p->timer, jiffies + br->multicast_membership_interval); + } } } diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 5cba9d228b9c..9aa51508ba83 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -832,7 +832,8 @@ void br_mdb_hash_fini(struct net_bridge *br); void br_mdb_switchdev_port(struct net_bridge_mdb_entry *mp, struct net_bridge_port *p, int type); void br_mdb_notify(struct net_device *dev, struct net_bridge_mdb_entry *mp, - struct net_bridge_port_group *pg, int type); + struct net_bridge_port_group *pg, int type, + bool swdev_notify); void br_rtr_notify(struct net_device *dev, struct net_bridge_port *port, int type); void br_multicast_del_pg(struct net_bridge_mdb_entry *mp, -- 2.17.1
Joseph Huang
2021-May-04 18:22 UTC
[Bridge] [PATCH 4/6] bridge: Force mcast_flooding for mrouter ports
When a port turns into an mrouter port, enable multicast flooding on that port even if mcast_flood is disabled by user config. This is necessary so that in a distributed system, the multicast packets can be fowarded to the Querier when the multicast source is attached to a Non-Querier bridge. Consider the following scenario: +--------------------+ | | | Snooping | +------------+ | Bridge 1 |----| Listener 1 | | (Querier) | +------------+ | | +--------------------+ | | +--------------------+ | | mrouter | | +-----------+ | +---------+ | | MC Source |----| Snooping | +-----------| | Bridge 2 | | (Non-Querier) | +--------------------+ In this scenario, Listener 1 will never receive multicast traffic from MC Source if mcast_flood is disabled on the mrouter port on Snooping Bridge 2. Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> --- net/bridge/br_multicast.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index d7fbe1f3af18..719ded3204a0 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -2680,6 +2680,21 @@ static void br_port_mc_router_state_change(struct net_bridge_port *port, switchdev_port_attr_set(port->dev, &attr, NULL); + /* Force mcast_flood if mrouter port + * this does not prevent netlink from changing it again + */ + if (is_mc_router && !(port->flags & BR_MCAST_FLOOD)) { + attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS; + attr.u.brport_flags.val = BR_MCAST_FLOOD; + attr.u.brport_flags.mask = BR_MCAST_FLOOD; + switchdev_port_attr_set(port->dev, &attr, NULL); + } else if (!is_mc_router) { + attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS; + attr.u.brport_flags.val = port->flags & BR_MCAST_FLOOD; + attr.u.brport_flags.mask = BR_MCAST_FLOOD; + switchdev_port_attr_set(port->dev, &attr, NULL); + } + /* Add/delete the router port to/from all multicast group * called whle br->multicast_lock is held */ -- 2.17.1
Joseph Huang
2021-May-04 18:22 UTC
[Bridge] [PATCH 5/6] bridge: Flood Queries even when mcast_flood is disabled
Modify the forwarding path so that received Queries are always flooded even when mcast_flood is disabled on a bridge port. In current implementation, when mcast_flood is disabled on a bridge port, Queries received from other Querier will not be forwarded out of that bridge port. This unfortunately broke multicast snooping. Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> --- net/bridge/br_forward.c | 3 ++- net/bridge/br_multicast.c | 3 +++ net/bridge/br_private.h | 3 +++ 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 6e9b049ae521..2fb9b4a78881 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -203,7 +203,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb, continue; break; case BR_PKT_MULTICAST: - if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev) + if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev && + !BR_INPUT_SKB_CB_FORCE_FLOOD(skb)) continue; break; case BR_PKT_BROADCAST: diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 719ded3204a0..b7d9c491abe0 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -3238,6 +3238,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br, err = br_ip4_multicast_igmp3_report(br, port, skb, vid); break; case IGMP_HOST_MEMBERSHIP_QUERY: + BR_INPUT_SKB_CB(skb)->force_flood = 1; br_ip4_multicast_query(br, port, skb, vid); break; case IGMP_HOST_LEAVE_MESSAGE: @@ -3300,6 +3301,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br, err = br_ip6_multicast_mld2_report(br, port, skb, vid); break; case ICMPV6_MGM_QUERY: + BR_INPUT_SKB_CB(skb)->force_flood = 1; err = br_ip6_multicast_query(br, port, skb, vid); break; case ICMPV6_MGM_REDUCTION: @@ -3322,6 +3324,7 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port, BR_INPUT_SKB_CB(skb)->igmp = 0; BR_INPUT_SKB_CB(skb)->mrouters_only = 0; + BR_INPUT_SKB_CB(skb)->force_flood = 0; if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) return 0; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 9aa51508ba83..59af599d48eb 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -491,6 +491,7 @@ struct br_input_skb_cb { #ifdef CONFIG_BRIDGE_IGMP_SNOOPING u8 igmp; u8 mrouters_only:1; + u8 force_flood:1; #endif u8 proxyarp_replied:1; u8 src_port_isolated:1; @@ -510,8 +511,10 @@ struct br_input_skb_cb { #ifdef CONFIG_BRIDGE_IGMP_SNOOPING # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb) (BR_INPUT_SKB_CB(__skb)->mrouters_only) +# define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb) (BR_INPUT_SKB_CB(__skb)->force_flood) #else # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb) (0) +# define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb) (0) #endif #define br_printk(level, br, format, args...) \ -- 2.17.1
Joseph Huang
2021-May-04 18:22 UTC
[Bridge] [PATCH 6/6] bridge: Always multicast_flood Reports
Modify the forwarding path so that IGMPv1/2/MLDv1 Reports are always flooded by br_multicast_flood, regardless of the check done by br_multicast_querier_exists. This patch fixes the problems where after a system boots up, the first couple of Reports are not handled properly in that: 1) the Report from the Host is being flooded (via br_flood) to all bridge ports, and 2) if the mrouter port's mcast_flood is disabled, the Reports received from other hosts will not be forwarded to the Querier. Signed-off-by: Joseph Huang <Joseph.Huang at garmin.com> --- net/bridge/br_device.c | 5 +++-- net/bridge/br_input.c | 5 +++-- net/bridge/br_multicast.c | 3 +++ net/bridge/br_private.h | 3 +++ 4 files changed, 12 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index e8b626cc6bfd..ff75ba242f38 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -88,8 +88,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) } mdst = br_mdb_get(br, skb, vid); - if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) && - br_multicast_querier_exists(br, eth_hdr(skb), mdst)) + if (((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) && + br_multicast_querier_exists(br, eth_hdr(skb), mdst)) || + BR_INPUT_SKB_CB_FORCE_MC_FLOOD(skb)) br_multicast_flood(mdst, skb, false, true); else br_flood(br, skb, BR_PKT_MULTICAST, false, true); diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 8875e953ac53..572d7f20477f 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -129,8 +129,9 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb switch (pkt_type) { case BR_PKT_MULTICAST: mdst = br_mdb_get(br, skb, vid); - if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) && - br_multicast_querier_exists(br, eth_hdr(skb), mdst)) { + if (((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) && + br_multicast_querier_exists(br, eth_hdr(skb), mdst)) || + BR_INPUT_SKB_CB_FORCE_MC_FLOOD(skb)) { if ((mdst && mdst->host_joined) || br_multicast_is_router(br)) { local_rcv = true; diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index b7d9c491abe0..dfdbe19f3e93 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -3231,6 +3231,7 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br, case IGMP_HOST_MEMBERSHIP_REPORT: case IGMPV2_HOST_MEMBERSHIP_REPORT: BR_INPUT_SKB_CB(skb)->mrouters_only = 1; + BR_INPUT_SKB_CB(skb)->force_mc_flood = 1; err = br_ip4_multicast_add_group(br, port, ih->group, vid, src, true); break; @@ -3294,6 +3295,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br, case ICMPV6_MGM_REPORT: src = eth_hdr(skb)->h_source; BR_INPUT_SKB_CB(skb)->mrouters_only = 1; + BR_INPUT_SKB_CB(skb)->force_mc_flood = 1; err = br_ip6_multicast_add_group(br, port, &mld->mld_mca, vid, src, true); break; @@ -3325,6 +3327,7 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port, BR_INPUT_SKB_CB(skb)->igmp = 0; BR_INPUT_SKB_CB(skb)->mrouters_only = 0; BR_INPUT_SKB_CB(skb)->force_flood = 0; + BR_INPUT_SKB_CB(skb)->force_mc_flood = 0; if (!br_opt_get(br, BROPT_MULTICAST_ENABLED)) return 0; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 59af599d48eb..6d4f20d7f482 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -492,6 +492,7 @@ struct br_input_skb_cb { u8 igmp; u8 mrouters_only:1; u8 force_flood:1; + u8 force_mc_flood:1; #endif u8 proxyarp_replied:1; u8 src_port_isolated:1; @@ -512,9 +513,11 @@ struct br_input_skb_cb { #ifdef CONFIG_BRIDGE_IGMP_SNOOPING # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb) (BR_INPUT_SKB_CB(__skb)->mrouters_only) # define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb) (BR_INPUT_SKB_CB(__skb)->force_flood) +# define BR_INPUT_SKB_CB_FORCE_MC_FLOOD(__skb) (BR_INPUT_SKB_CB(__skb)->force_mc_flood) #else # define BR_INPUT_SKB_CB_MROUTERS_ONLY(__skb) (0) # define BR_INPUT_SKB_CB_FORCE_FLOOD(__skb) (0) +# define BR_INPUT_SKB_CB_FORCE_MC_FLOOD(__skb) (0) #endif #define br_printk(level, br, format, args...) \ -- 2.17.1
Nikolay Aleksandrov
2021-May-04 20:05 UTC
[Bridge] [PATCH net 0/6] bridge: Fix snooping in multi-bridge config with switchdev
On 04/05/2021 21:22, Joseph Huang wrote:> This series of patches contains the following fixes: > > 1. In a distributed system with multiple hardware-offloading bridges, > if a multicast source is attached to a Non-Querier bridge, the bridge > will not forward any multicast packets from that source to the Querier. > > +--------------------+ > | | > | Snooping | +------------+ > | Bridge 1 |----| Listener 1 | > | (Querier) | +------------+ > | | > +--------------------+ > | > | > +----+---------+-----+ > | | mrouter | | > +-----------+ | +---------+ | +------------+ > | MC Source |----| Snooping |----| Listener 2 | > +-----------| | Bridge 2 | +------------+ > | (Non-Querier) | > +--------------------+ > > In this scenario, Listener 1 will never receive multicast traffic > from MC Source since Snooping Bridge 2 does not forward multicast > packets to the mrouter port. Patches 0001, 0002, and 0003 address > this issue. > > 2. If mcast_flood is disabled on a bridge port, some of the snooping > functions stop working properly. > > a. Consider the following scenario: > > +--------------------+ > | | > | Snooping | +------------+ > | Bridge 1 |----| Listener 1 | > | (Querier) | +------------+ > | | > +--------------------+ > | > | > +--------------------+ > | | mrouter | | > +-----------+ | +---------+ | > | MC Source |----| Snooping | > +-----------| | Bridge 2 | > | (Non-Querier) | > +--------------------+ > > In this scenario, Listener 1 will never receive multicast traffic > from MC Source if mcast_flood is disabled on the mrouter port on > Snooping Bridge 2. Patch 0004 addresses this issue. > > b. For a Non-Querier bridge, if mcast_flood is disabled on a bridge > port, Queries received from other Querier will not be forwarded > out of that bridge port. Patch 0005 addresses this issue. > > 3. After a system boots up, the first couple Reports are not handled > properly: > > 1) the Report from the Host is being flooded (via br_flood) to all > bridge ports, and > 2) if the mrouter port's mcast_flood is disabled, the Reports received > from other hosts will not be forwarded to the Querier. > > Patch 0006 addresses this issue. > > These patches were developed and verified initially against 5.4 kernel > (due to hardware platform limitation) and forward-patched to 5.12. > Snooping code introduced between 5.4 and 5.12 are not extensively tested > (only IGMPv2/MLDv1 were tested). The hardware platform used were two > bridges utilizing a single Marvell 88E6352 Ethernet switch chip (i.e., > no cross-chip bridging involved). > > Joseph Huang (6): > bridge: Refactor br_mdb_notify > bridge: Offload mrouter port forwarding to switchdev > bridge: Avoid traffic disruption when Querier state changes > bridge: Force mcast_flooding for mrouter ports > bridge: Flood Queries even when mcast_flood is disabled > bridge: Always multicast_flood Reports > > net/bridge/br_device.c | 5 +- > net/bridge/br_forward.c | 3 +- > net/bridge/br_input.c | 5 +- > net/bridge/br_mdb.c | 70 +++++++++++++--------- > net/bridge/br_multicast.c | 121 ++++++++++++++++++++++++++++++++++---- > net/bridge/br_private.h | 11 +++- > 6 files changed, 169 insertions(+), 46 deletions(-) > > > base-commit: 5e321ded302da4d8c5d5dd953423d9b748ab3775 >Hi, This patch-set is inappropriate for -net, if at all. It's quite late over here and I'll review the rest later, but I can say from a quick peek that patch 02 is unacceptable for it increases the complexity with 1 order of magnitude of all add/del call paths and some of them can be invoked on user packets. A lot of this functionality should be "hidden" in the driver or done by a user-space daemon/helper. Most of the flooding behaviour changes must be hidden behind some new option otherwise they'll break user setups that rely on the current. I'll review the patches in detail over the following few days, net-next is closed anyway. Cheers, Nik