Andrew Lunn
2017-Aug-26 20:56 UTC
[Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
This is a WIP patchset i would like comments on from bridge, switchdev and hardware offload people. The linux bridge supports IGMP snooping. It will listen to IGMP reports on bridge ports and keep track of which groups have been joined on an interface. It will then forward multicast based on this group membership. When the bridge adds or removed groups from an interface, it uses switchdev to request the hardware add an mdb to a port, so the hardware can perform the selective forwarding between ports. What is not covered by the current bridge code, is IGMP joins/leaves from the host on the brX interface. No such monitoring is performed. With a pure software bridge, it is not required. All mulitcast frames are passed to the brX interface, and the network stack filters them, as it does for any interface. However, when hardware offload is involved, things change. We should program the hardware to only send multcast packets to the host when the host has in interest in them. Thus we need to perform IGMP snooping on the brX interface, just like any other interface of the bridge. However, currently the brX interface is missing all the needed data structures to do this. There is no net_bridge_port structure for the brX interface. This strucuture is created when an interface is added to the bridge. But the brX interface is not a member of the bridge. So this patchset makes the brX interface a first class member of the bridge. When the brX interface is opened, the interface is added to the bridge. A net_bridge_port is allocated for it, and IGMP snooping is performed as usual. There are some complexities here. Some assumptions are broken, like the master interface of a port interface is the bridge interface. The brX interface cannot be its own master. The use of netdev_master_upper_dev_get() within the bridge code has been changed to reflecit this. The bridge receive handler needs to not process frames for the brX interface, etc. The interface downward to the hardware is also an issue. The code presented here is a hack and needs to change. But that is secondary and can be solved once it is agreed how the bridge needs to change to support this use case. Comment welcome and wanted. Andrew Andrew Lunn (5): net: rtnetlink: Handle bridge port without upper device net: bridge: Skip receive handler on brX interface net: bridge: Make the brX interface a member of the bridge net: dsa: HACK: Handle MDB add/remove for none-switch ports net: dsa: Don't include CPU port when adding MDB to a port include/linux/if_bridge.h | 1 + net/bridge/br_device.c | 12 ++++++++++-- net/bridge/br_if.c | 37 ++++++++++++++++++++++++------------- net/bridge/br_input.c | 4 ++++ net/bridge/br_mdb.c | 2 -- net/bridge/br_multicast.c | 7 ++++--- net/bridge/br_private.h | 1 + net/core/rtnetlink.c | 23 +++++++++++++++++++++-- net/dsa/port.c | 19 +++++++++++++++++-- net/dsa/switch.c | 2 +- 10 files changed, 83 insertions(+), 25 deletions(-) -- 2.14.1
Andrew Lunn
2017-Aug-26 20:56 UTC
[Bridge] [PATCH RFC WIP 1/5] net: rtnetlink: Handle bridge port without upper device
The brX interface will with a following patch becomes a member of the bridge. It however cannot be a slave interface, since it would have to be a slave of itself. netdev_master_upper_dev_get() returns NULL as a result. Handle this NULL, by knowing this bridge slave must also be the master, i.e. what we are looking for. Signed-off-by: Andrew Lunn <andrew at lunn.ch> --- net/core/rtnetlink.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 9201e3621351..2673eb430b6f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3093,8 +3093,12 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, if ((!ndm->ndm_flags || ndm->ndm_flags & NTF_MASTER) && (dev->priv_flags & IFF_BRIDGE_PORT)) { struct net_device *br_dev = netdev_master_upper_dev_get(dev); - const struct net_device_ops *ops = br_dev->netdev_ops; + const struct net_device_ops *ops; + if (!br_dev) + br_dev = dev; + + ops = br_dev->netdev_ops; err = ops->ndo_fdb_add(ndm, tb, dev, addr, vid, nlh->nlmsg_flags); if (err) @@ -3197,7 +3201,12 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, if ((!ndm->ndm_flags || ndm->ndm_flags & NTF_MASTER) && (dev->priv_flags & IFF_BRIDGE_PORT)) { struct net_device *br_dev = netdev_master_upper_dev_get(dev); - const struct net_device_ops *ops = br_dev->netdev_ops; + const struct net_device_ops *ops; + + if (!br_dev) + br_dev = dev; + + ops = br_dev->netdev_ops; if (ops->ndo_fdb_del) err = ops->ndo_fdb_del(ndm, tb, dev, addr, vid); @@ -3332,6 +3341,8 @@ static int rtnl_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb) if (!br_idx) { /* user did not specify a specific bridge */ if (dev->priv_flags & IFF_BRIDGE_PORT) { br_dev = netdev_master_upper_dev_get(dev); + if (!br_dev) + br_dev = dev; cops = br_dev->netdev_ops; } } else { @@ -3410,6 +3421,9 @@ int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, struct net_device *br_dev = netdev_master_upper_dev_get(dev); int err = 0; + if (!br_dev) + br_dev = dev; + nlh = nlmsg_put(skb, pid, seq, RTM_NEWLINK, sizeof(*ifm), nlflags); if (nlh == NULL) return -EMSGSIZE; @@ -3647,6 +3661,8 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, if (!flags || (flags & BRIDGE_FLAGS_MASTER)) { struct net_device *br_dev = netdev_master_upper_dev_get(dev); + if (!br_dev) + br_dev = dev; if (!br_dev || !br_dev->netdev_ops->ndo_bridge_setlink) { err = -EOPNOTSUPP; @@ -3723,6 +3739,9 @@ static int rtnl_bridge_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, if (!flags || (flags & BRIDGE_FLAGS_MASTER)) { struct net_device *br_dev = netdev_master_upper_dev_get(dev); + if (!br_dev) + br_dev = dev; + if (!br_dev || !br_dev->netdev_ops->ndo_bridge_dellink) { err = -EOPNOTSUPP; goto out; -- 2.14.1
Andrew Lunn
2017-Aug-26 20:56 UTC
[Bridge] [PATCH RFC WIP 2/5] net: bridge: Skip receive handler on brX interface
The brX interface will soon become a member of the bridge. As such, it will get a receiver handler assigned. However, we don't want to handle packets received on this soft interfaces. So detect the condition and say all the packets pass. --- net/bridge/br_input.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c index 7637f58c1226..38c2a41968f2 100644 --- a/net/bridge/br_input.c +++ b/net/bridge/br_input.c @@ -267,6 +267,10 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; p = br_port_get_rcu(skb->dev); + + if (p->dev == p->br->dev) + return RX_HANDLER_PASS; + if (p->flags & BR_VLAN_TUNNEL) { if (br_handle_ingress_vlan_tunnel(skb, p, nbp_vlan_group_rcu(p))) -- 2.14.1
Andrew Lunn
2017-Aug-26 20:56 UTC
[Bridge] [PATCH RFC WIP 3/5] net: bridge: Make the brX interface a member of the bridge
In order to perform IGMP snooping on the brX interface, it has to be part of the bridge, so that the code snooping on normal bridge ports keeps track of IGMP joins and leaves. When the brX interface is opened, add the interface to the bridge. When the brX interface is closed, remove it from the bridge. This port does however need some special handling. So add a bridge port flag, BR_SOFT_INTERFACE, indicating a port is the sort interface of the bridge. When the port is added to the bridge, the netdev for this port cannot be linked to the master device, since it is the master device. Similarly when removing the port, it cannot be unlinked from the master device. With the brX interface now being a member of the bridge, and having all associated structures, we can process IGMP messages sent by the interface. This is done by the br_multicast_rcv() function, which takes the bridge_port structure as a parameter. This cannot be easily found, so keep track of it in the net_bridge structure. --- include/linux/if_bridge.h | 1 + net/bridge/br_device.c | 12 ++++++++++-- net/bridge/br_if.c | 37 ++++++++++++++++++++++++------------- net/bridge/br_mdb.c | 2 -- net/bridge/br_multicast.c | 7 ++++--- net/bridge/br_private.h | 1 + 6 files changed, 40 insertions(+), 20 deletions(-) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 3cd18ac0697f..8a03821d1827 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -49,6 +49,7 @@ struct br_ip_list { #define BR_MULTICAST_TO_UNICAST BIT(12) #define BR_VLAN_TUNNEL BIT(13) #define BR_BCAST_FLOOD BIT(14) +#define BR_SOFT_INTERFACE BIT(15) #define BR_DEFAULT_AGEING_TIME (300 * HZ) diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 861ae2a165f4..f27ca62fd4a5 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -69,7 +69,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev) br_flood(br, skb, BR_PKT_MULTICAST, false, true); goto out; } - if (br_multicast_rcv(br, NULL, skb, vid)) { + if (br_multicast_rcv(br, br->local_port, skb, vid)) { kfree_skb(skb); goto out; } @@ -133,6 +133,14 @@ static void br_dev_uninit(struct net_device *dev) static int br_dev_open(struct net_device *dev) { struct net_bridge *br = netdev_priv(dev); + int err; + + err = br_add_if(br, br->dev); + if (err) + return err; + + br->local_port = list_first_or_null_rcu(&br->port_list, + struct net_bridge_port, list); netdev_update_features(dev); netif_start_queue(dev); @@ -161,7 +169,7 @@ static int br_dev_stop(struct net_device *dev) netif_stop_queue(dev); - return 0; + return br_del_if(br, br->dev); } static void br_get_stats64(struct net_device *dev, diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index f3aef22931ab..49208e774191 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -284,7 +284,8 @@ static void del_nbp(struct net_bridge_port *p) nbp_update_port_count(br); - netdev_upper_dev_unlink(dev, br->dev); + if (!(p->flags & BR_SOFT_INTERFACE)) + netdev_upper_dev_unlink(dev, br->dev); dev->priv_flags &= ~IFF_BRIDGE_PORT; @@ -362,6 +363,8 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br, p->priority = 0x8000 >> BR_PORT_BITS; p->port_no = index; p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD; + if (br->dev == dev) + p->flags |= BR_SOFT_INTERFACE; br_init_port(p); br_set_state(p, BR_STATE_DISABLED); br_stp_port_timer_init(p); @@ -500,8 +503,11 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) return -EINVAL; /* No bridging of bridges */ - if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) - return -ELOOP; + if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) { + /* Unless it is our own soft interface */ + if (br->dev != dev) + return -ELOOP; + } /* Device is already being bridged */ if (br_port_exists(dev)) @@ -540,9 +546,11 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) dev->priv_flags |= IFF_BRIDGE_PORT; - err = netdev_master_upper_dev_link(dev, br->dev, NULL, NULL); - if (err) - goto err5; + if (!(p->flags & BR_SOFT_INTERFACE)) { + err = netdev_master_upper_dev_link(dev, br->dev, NULL, NULL); + if (err) + goto err5; + } err = nbp_switchdev_mark_set(p); if (err) @@ -563,13 +571,15 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) else netdev_set_rx_headroom(dev, br_hr); - if (br_fdb_insert(br, p, dev->dev_addr, 0)) - netdev_err(dev, "failed insert local address bridge forwarding table\n"); + if (!(p->flags & BR_SOFT_INTERFACE)) { + if (br_fdb_insert(br, p, dev->dev_addr, 0)) + netdev_err(dev, "failed insert local address bridge forwarding table\n"); - err = nbp_vlan_init(p); - if (err) { - netdev_err(dev, "failed to initialize vlan filtering on this port\n"); - goto err7; + err = nbp_vlan_init(p); + if (err) { + netdev_err(dev, "failed to initialize vlan filtering on this port\n"); + goto err7; + } } spin_lock_bh(&br->lock); @@ -597,7 +607,8 @@ int br_add_if(struct net_bridge *br, struct net_device *dev) br_fdb_delete_by_port(br, p, 0, 1); nbp_update_port_count(br); err6: - netdev_upper_dev_unlink(dev, br->dev); + if (!(p->flags & BR_SOFT_INTERFACE)) + netdev_upper_dev_unlink(dev, br->dev); err5: dev->priv_flags &= ~IFF_BRIDGE_PORT; netdev_rx_handler_unregister(dev); diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index a0b11e7d67d9..47f0d9b4221d 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -117,8 +117,6 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb, struct br_mdb_entry e; port = p->port; - if (!port) - continue; memset(&e, 0, sizeof(e)); e.ifindex = port->dev->ifindex; diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index dae3af1f531a..f1bf9ec15de8 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -915,7 +915,7 @@ static void __br_multicast_send_query(struct net_bridge *br, if (!skb) return; - if (port) { + if (port && !(port->flags & BR_SOFT_INTERFACE)) { skb->dev = port->dev; br_multicast_count(br, port, skb, igmp_type, BR_MCAST_DIR_TX); @@ -944,8 +944,9 @@ static void br_multicast_send_query(struct net_bridge *br, memset(&br_group.u, 0, sizeof(br_group.u)); - if (port ? (own_query == &port->ip4_own_query) : - (own_query == &br->ip4_own_query)) { + if (port && !(port->flags & BR_SOFT_INTERFACE) ? + (own_query == &port->ip4_own_query) : + (own_query == &br->ip4_own_query)) { other_query = &br->ip4_other_query; br_group.proto = htons(ETH_P_IP); #if IS_ENABLED(CONFIG_IPV6) diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index fd9ee73e0a6d..c4b99a35abb0 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -296,6 +296,7 @@ struct net_bridge { spinlock_t lock; spinlock_t hash_lock; struct list_head port_list; + struct net_bridge_port *local_port; struct net_device *dev; struct pcpu_sw_netstats __percpu *stats; /* These fields are accessed on each packet */ -- 2.14.1
Andrew Lunn
2017-Aug-26 20:56 UTC
[Bridge] [PATCH RFC WIP 4/5] net: dsa: HACK: Handle MDB add/remove for none-switch ports
When there is a mdb added to a port which is not in the switch, we need the switch to forward traffic for the group to the software bridge, so it can forward it out the none-switch port. The current implementation is a hack and will be replaced. Currently only the bridge soft interface is supported. When there is a join/leave on the soft interface, switchdev calls are made on the soft interface device, brX. This does not have a switchdev ops structure registered, so all lower interfaces of brX get there switchdev function called. These are switch ports, and do have switchdev ops. By comparing the original interface to the called interface, we can determine this is not for a switch port, and add/remove the mdb to the CPU port. --- net/dsa/port.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/net/dsa/port.c b/net/dsa/port.c index d6e07176df3f..d8e4bfefd97d 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -194,8 +194,15 @@ int dsa_port_mdb_add(struct dsa_port *dp, .mdb = mdb, }; - pr_info("dsa_port_mdb_add: %d %d", info.sw_index, info.port); - + if (dp->netdev != mdb->obj.orig_dev) { + /* Not a port for this switch, so forward + * multicast out the CPU port to the bridge. + */ + struct dsa_switch_tree *dst = dp->ds->dst; + struct dsa_port *cpu_dp = dsa_get_cpu_port(dst); + info.port = cpu_dp->index; + return dsa_port_notify(cpu_dp, DSA_NOTIFIER_MDB_ADD, &info); + } return dsa_port_notify(dp, DSA_NOTIFIER_MDB_ADD, &info); } @@ -208,6 +215,14 @@ int dsa_port_mdb_del(struct dsa_port *dp, .mdb = mdb, }; + if (dp->netdev != mdb->obj.orig_dev) { + struct dsa_switch_tree *dst = dp->ds->dst; + struct dsa_port *cpu_dp = dsa_get_cpu_port(dst); + + info.port = cpu_dp->index; + return dsa_port_notify(cpu_dp, DSA_NOTIFIER_MDB_DEL, &info); + } + return dsa_port_notify(dp, DSA_NOTIFIER_MDB_DEL, &info); } -- 2.14.1
Andrew Lunn
2017-Aug-26 20:56 UTC
[Bridge] [PATCH RFC WIP 5/5] net: dsa: Don't include CPU port when adding MDB to a port
Now that the MDB are explicitly added to the CPU port when required, don't add the CPU port adding an MDB to a switch port. Signed-off-by: Andrew Lunn <andrew at lunn.ch> --- net/dsa/switch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/dsa/switch.c b/net/dsa/switch.c index 97e2e9c8cf3f..c178e2b86a9a 100644 --- a/net/dsa/switch.c +++ b/net/dsa/switch.c @@ -130,7 +130,7 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds, if (ds->index == info->sw_index) set_bit(info->port, group); for (port = 0; port < ds->num_ports; port++) - if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)) + if (dsa_is_dsa_port(ds, port)) set_bit(port, group); if (switchdev_trans_ph_prepare(trans)) { -- 2.14.1
Nikolay Aleksandrov
2017-Aug-26 22:17 UTC
[Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
On 26/08/17 23:56, Andrew Lunn wrote:> This is a WIP patchset i would like comments on from bridge, switchdev > and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring isHi Andrew, Have you taken a look at mglist (the boolean, probably needs a rename) ? It is for exactly that purpose, to track which groups the bridge is interested in. I assume I'm forgetting or missing something here.> performed. With a pure software bridge, it is not required. All > mulitcast frames are passed to the brX interface, and the networkIf mglist (again the boolean) is false then they won't be passed up.> stack filters them, as it does for any interface. However, when > hardware offload is involved, things change. We should program the > hardware to only send multcast packets to the host when the host has > in interest in them.Granted the boolean mglist might need some changes (esp. with host group leave) but I think it can be used to program switchdev for host join/leave, can't we adjust its behaviour instead of introducing this complexity and avoid many headaches ?> > Thus we need to perform IGMP snooping on the brX interface, just like > any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. There > is no net_bridge_port structure for the brX interface. This strucuture > is created when an interface is added to the bridge. But the brX > interface is not a member of the bridge. So this patchset makes the > brX interface a first class member of the bridge. When the brX > interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed as > usual.I have actually discussed this idea long time ago with Vlad and it has very nice upsides (most important one removing br/port checks everywhere) but it blows up fast with special cases for the bridge and things look very similar. You'll need to rework the whole bridge and turn every bridge special case into either a port generic one or again bridge-specific special case but with a check for the new flag. I will not point out every bug that comes out of this, but registering the bridge rx handler to itself is simply wrong on many levels and breaks many setups.> > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. The > brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been changed > to reflecit this. The bridge receive handler needs to not process > frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change to > support this use case.Definitely agree with this statement. :-)> > Comment welcome and wanted. > > Andrew > > Andrew Lunn (5): > net: rtnetlink: Handle bridge port without upper device > net: bridge: Skip receive handler on brX interface > net: bridge: Make the brX interface a member of the bridge > net: dsa: HACK: Handle MDB add/remove for none-switch ports > net: dsa: Don't include CPU port when adding MDB to a port > > include/linux/if_bridge.h | 1 + > net/bridge/br_device.c | 12 ++++++++++-- > net/bridge/br_if.c | 37 ++++++++++++++++++++++++------------- > net/bridge/br_input.c | 4 ++++ > net/bridge/br_mdb.c | 2 -- > net/bridge/br_multicast.c | 7 ++++--- > net/bridge/br_private.h | 1 + > net/core/rtnetlink.c | 23 +++++++++++++++++++++-- > net/dsa/port.c | 19 +++++++++++++++++-- > net/dsa/switch.c | 2 +- > 10 files changed, 83 insertions(+), 25 deletions(-) >
Florian Fainelli
2017-Aug-28 02:44 UTC
[Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
Hi Andrew, On 08/26/2017 01:56 PM, Andrew Lunn wrote:> This is a WIP patchset i would like comments on from bridge, > switchdev and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring is performed. > With a pure software bridge, it is not required. All mulitcast frames > are passed to the brX interface, and the network stack filters them, > as it does for any interface. However, when hardware offload is > involved, things change. We should program the hardware to only send > multcast packets to the host when the host has in interest in them.OK, so if I understand this right, without a bridge, we have the following happen today: with a DSA-enabled setup using any kind of switch tagging protocol, if a host is interested in receiving particular multicast traffic, we would receive IGMP joins/leaves through sw0p0, and the stack should call ndo_set_rx_mode for sw0p0, which would be dsa_slave_set_rx_mode() and which would synchronize the DSA master network device with the slave network device, everything works fine provided that the CPU port is configured to accept multicast traffic. Note here that we don't really add a MDB entry for sw0p0 when that happens, but it seems like we should for switches that lack IGMP snooping and/or multicast filtering. With the current bridge and DSA code, are not we actually always going to get the CPU port to be added with the multicast address and therefore no filtering is occurring and snooping is pretty much useless?> > Thus we need to perform IGMP snooping on the brX interface, just > like any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. > There is no net_bridge_port structure for the brX interface. This > strucuture is created when an interface is added to the bridge. But > the brX interface is not a member of the bridge. So this patchset > makes the brX interface a first class member of the bridge. When the > brX interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed > as usual.Would not making brX be part of the bridge have a huge negative performance impact on locally generated traffic either? Even though we do an early return in br_handle_frame() this may become noticeable.> > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. > The brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been > changed to reflecit this. The bridge receive handler needs to not > process frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change > to support this use case. > > Comment welcome and wanted.While I understand the reasons why you did it that way, I think this is going to break a lot of code in bridge that does not expect brX to be a bridge port member. Maybe we can just generate switch MDB events targeting the bridge network device and let switch drivers resolve that to whatever their CPU/master port is? It does sound like we are moving more and more to a model where brX becomes one (if not the only one) net_device representor of what the CPU/master port of a switch is (at least with DSA) which sort of makes us go back to the multi-CPU port discussion we had a while ago. Thanks! -- Florian
Stephen Hemminger
2017-Aug-28 15:11 UTC
[Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
On Sat, 26 Aug 2017 22:56:05 +0200 Andrew Lunn <andrew at lunn.ch> wrote:> This is a WIP patchset i would like comments on from bridge, switchdev > and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring is > performed. With a pure software bridge, it is not required. All > mulitcast frames are passed to the brX interface, and the network > stack filters them, as it does for any interface. However, when > hardware offload is involved, things change. We should program the > hardware to only send multcast packets to the host when the host has > in interest in them. > > Thus we need to perform IGMP snooping on the brX interface, just like > any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. There > is no net_bridge_port structure for the brX interface. This strucuture > is created when an interface is added to the bridge. But the brX > interface is not a member of the bridge. So this patchset makes the > brX interface a first class member of the bridge. When the brX > interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed as > usual. > > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. The > brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been changed > to reflecit this. The bridge receive handler needs to not process > frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change to > support this use case. > > Comment welcome and wanted. > > Andrew > > Andrew Lunn (5): > net: rtnetlink: Handle bridge port without upper device > net: bridge: Skip receive handler on brX interface > net: bridge: Make the brX interface a member of the bridge > net: dsa: HACK: Handle MDB add/remove for none-switch ports > net: dsa: Don't include CPU port when adding MDB to a port > > include/linux/if_bridge.h | 1 + > net/bridge/br_device.c | 12 ++++++++++-- > net/bridge/br_if.c | 37 ++++++++++++++++++++++++------------- > net/bridge/br_input.c | 4 ++++ > net/bridge/br_mdb.c | 2 -- > net/bridge/br_multicast.c | 7 ++++--- > net/bridge/br_private.h | 1 + > net/core/rtnetlink.c | 23 +++++++++++++++++++++-- > net/dsa/port.c | 19 +++++++++++++++++-- > net/dsa/switch.c | 2 +- > 10 files changed, 83 insertions(+), 25 deletions(-) >Sorry you can't change the semantics of the bridge like this. There are likely to be scripts and management utilities that won't work after this. Figure out another way. Such as adding IGMP updates in the local packet send/receive path.